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11 MEANING OF STATISTICS 

People view Statistics in many different ways. Generally it is considered to be a subject that deals 
with percentages, charts, graphs, averages and tables. Some people think that Statistics is a subject 
consisting of rules; methods and techniques of collecting and presenting large amount of numerical 
information, while other people think that it is a subject of making inferences about the population on the 
basis of sample information. 


The word "Statistics" which comes from the Latin word status, ойла political ле: originally 
meant information useful to the « ‚ for example, information about the sizes of populations and armed 
forces. But this word has now different meanings. 

In the first place, the word statistics refers to “numerical facts systematically arranged". In this 
sense, the word statistics is always used in the plural. We have, for instance, statistics of prices, statistics 
ef road accidents, statistics of crimes, statistics of births, statistics of educational institutions, etc. In all 


Фезе examples, the word statistics denotes a set of numerical data іп tive fields. This is the 
meaning the man in the street gives to the word Statistics and most usually use the word data 


stead. қ; 
Example 1.1 In the following examples, the facts and fgg usually called Statistics presented in 


Же media almost every day are given: 
i) Children who brush their teeth with brand хав have 60% fewer countries. 


ii) Тһе Bureau of census projects the f Pakistan to be 170.1 million in the year 2010. 

ш) Eight out of ten Pakistanis do not 

m) The prevalence of diabetes is neatly times as high in over weight people as compared to 
normal weight people. ҳу 

х) In 1980 it was estimated УИ of people had tried any sort of drug; where as in 2008 it 

was estimated that 10% done so. 

In the second place, statistics is defined as a discipline that includes procedures and 

used to collect, process and analyse numerical data to make inferences and to reach decisions 

face of uncertainty. It should of course be borne in mind that uncertainty does not imply ignorance 


111 Use of Statistical Information. The statistical information are and can be used for a variety 
Some of them are: 


to inform general public; 
to explain things that have happened; 
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ш) ^ to justify a claim; 
iv) to provide general comparisons; 
v) to predict the decision regarding future outcomes; 
vi) to estimate the unknown quantities; 
vii) to establish association / relationship between factors. 


Hence Statistics is a subject which is much more than just numbers. It tells us what is done te 
with numbers. The following three examples further explain how Statistics may be used: 


Example 1.2. Suppose we want to determine the best teacher at Govt. College University, 
How should we decide this? This could be done by asking Govt. College University students who thet 
teacher is. To do so, we collect the data, analyze the results and make the decision, Now. 
questions are: 


i) should we survey every student? 
ii) how will the survey be conducted? ; ^ У 
iii) ^ how will the data be analyzed? Q 
iv) how will the best teacher be determined? etc. ev 
In order to answer these and other questions, Stati jáRechniques are used. 


Example 1.3 А TV station claims that an алкен of a product on their channel attracts 
«ustomers compared to all other TV channels. this claim is based on data, there it can be 
market the TV channel. Suppose we have P) bts about the claim. In order to remove the doubts, 
might gather relevant information, anal results using appropriate statistical technique and 
decision regarding the claim. қ 


Example 1.4 Suppose NM of the Punjab is planning an expansion program of its 
facilities. To draw up an effe. "course of action, the University authorities decide that it needs 
answer this question, how e students will we need to accommodate over the next ten 
The question can be ken down into many smaller questions. How many college students 

, then be in the Punjab? How many will want to attend the University of the Punjab? etc, Once 
| Statistical methods can assist in evaluating and planning of expansion program. 


1.1.2 Characteristics of Statistics. The definition stated above indicates that statistics is a s 
in its own right. It may therefore be desirable to know the characteristic features of statistics in 
appreciate and understand its general nature. Some of its important characteristics are given below: 


i) Statistics deals with the behaviour of aggregates or large groups of data. It has nothing 
with what is happening to a particular individual or object of the aggregate. 


ii) ^ Statistics deals with aggregates of observations of the same kind rather than isolated fi 


i) ^ Statistics deals with variability that obscure underlying pattems. No two objects in 
universe are exactly alike. If they were, there would have been по statistical problem. 
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iv) Statistics deals with uncertainties as every process of getting observations whether controlled 
or uncontrolled, involves deficiencies or chance variation. That is why we have to talk in 
terms of probability. 

v) Statistics deals: with those characteristics or aspects of Е which сап be described 
numerically either by counts or by measurements. 


vi) Statistics deals with those aggregates which are subject to a number of random causes, e.g. the 
heights of persons are subject to a number of causes such as race, ancestry, age, diet, habits, 
climate and so forth. 


vii) Statistical laws are valid on the average or in the long run. There is no guarantee that a certain 
law will hold in all cases. Statistical inference is therefore made in the face of uncertainty. 


vai) Statistical results might be misleading and incorrect if sufficient care in collecting, processing 
and interpreting the data is not exercised or if the statistical data are handled by a person who 
is not well versed in the subject matter of statistics. 


1.1.3 Descriptive and Inferential Statistics, Statistics as a subject, may be divided into 


viptive statistics and inferential statistics. 


of the дыш оғ data, their graphical displays 
jes that provide information, about the centre of Se and indicate the spread of the 


ме 


2 part of the data, known as sample. This 
of statistical hypotheses, This p 
are made on the basis of sample evi 

. 


tatistics is based on probability theory as the inferences 
‚ cannot be absolutely certain. 


Inferential Statistics 


i) А cricket player wants to find his i) А cricket player wants to estimate 
Score average for the last 20 his chance of scoring based on his 
games. current season average. 


ii) Aamir wants to describe the ii) Based on the first four test scores, 
variation in his four test scores in 


Statistics. 


їй) Mrs. Rashid wants to determine 
the average weekly amount she 
spent on groceries in the past 6 
months. 


Aamir would like to predict the 
variation in his final Statistics test 


їп) Based on last six months grocery 
bills, Mrs. Rashid would like to 
predict the average amount she 
will spend on groceries for the 
upcoming year. 


асс rcs сот 


ODUCTION TO STATISTICAL 


Pini may be så dh as the height of all college students с or mino such аз а the 
айне EOM QU uat COS ШЫГЫ ЕНЕН СЫЛГА fite population is called te д 
the population and is denoted by the letter М. | describing a population- 
parameters, customarily represented by Greek letters. t is important to note that in statistics the 
population is a technical term not necessarily referring to all the people in a specified area, 
denoting the aggregate of measurements or counts of some characteristic for the entire group of obj 
individuals. 


A sample is a part or a subset of a population. Generally it consists of some of the observations 
in certain situations, it may include the whole of the The number of observations included $ 
sample is called the size of the sample and is denoted by the letter л. A numerical quantity computed 
a sample, is called a statistic, which is usually represented by ordinary Latin letter. The i 
derived from sample data is used to draw conclusions about the population.* 


Example 1.5 State whether each of the following is a population or a sample. 
i) Total number of absentees by all studeris in & college’ duri last month. 
ii) Number of colour TV sets owned by all families in Lahe®. 
i) ^ Monthly salaries of all employees of a company. eV 
iv) — Wheat yield per acre for 5 pieces of land. e 
v) Number of computers sold durihg the ің БІ at all the computer stores in Lahore 


Solution ы 
i) Population D 
ii) ^ Population d 
ii) Population 2. 
iv) Sample NS 
v) Population v 


1.1.5 Importance of Statistics. Statistics is perhaps a subject that is used by everybody. 
following functions and uses of statistics in most diverse fields serve to indicate its importance. 


i) Statistics assists in summarizing the larger sets of data in a form that is easily understa! 
ii) Statistics assists in the efficient design of laboratory and field experiments as well as survi 
jii) Statistics assists in a sound and effective planning in any field of inquiry. 


iv) Statistics assists in drawing general conclusions and in making predictions of how much 
thing will happen under given conditions. 

v) Statistical techniques being powerful tools for analysing numerical data, are used in 
every branch of learning. In the biological and physical sciences, Genetics, Ag 
Anthropometry, Astronomy, Physics, Geology, etc. are the main areas where stati 
techniques have been developed and are increasingly used. 
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vi) А businessman, an industrialist and a research worker all’ employ statistical methods in their - 
work. Banks, Insurance companies and Governments all have their statistics departments. 


ті) А modern administrator whether in public or private sector, leans on statistical data to provide 
a factual basis for decision. 


vai) A politician uses statistics advantageously to lend support and credence to his arguments 
while elucidating the problems he handles, 
m) А social scientist uses statistical methods in various areas of socio-economic life of a nation. It 


is sometimes said that “а social scientist without an adequate understanding of statistics, ís 
often like the blind man groping in a dark room for a black cat that is not there". 


OBSERVATIONS AND VARIABLES 


In statistics, an observation often means any sort of numerically recording of information, whether 
== 2 physical measurement such as height or weight; a classification such as heads or tails, or an answer 
question such as yes or no. i 


1.2.1 Variables. A characteristic that varies with an individual or object, is called a variable. 
example, age is a variable as it varies from person to person. A а%ћаЫ!е can assume a number of 
The given set of all possible: the variable ше % can 
a given problem, the domain of a variable contains only ongy Sue, then the variable is referred to as 
t. | d 
Variables may be classified into quantitative 
tic of interest, A variable is called a 

numerically such as age, weight, inc 
istic is non-numerical such as 
etc. the variable is referred 
attribute. An individual or 
тіло been assigned to one of 


1.1 Discrete and Сопбіріф% Variables. A quantitative variable may be classified as discrete or 
‚ A discrete уапаМ%фы one that can take only a discrete set of integers or whole numbers, that 
values are taken by jumps or breaks. A discrete variable represents coun data such as the number 
in a family, the number of rooms in a house, the number of deaths in an accident, the income 
»dividual, etc. 

A variable is called a continuous variable if it can take on any value--fractional or integer—within a 
ге. its domain is an interval with all possible values without gaps. A continuous variable 
measurement data such as the age of a person, the height of a plant, the weight of a 
„(һе temperature at a place, etc. 


tative variable when a characteristic can be 
r number of children. On the other hand, if the 
gender, eye-colour, quality, intelligence, poverty, 

qualitative variable. A qualitative characteristic is also 
{ with such a characteristic can be counted or enumerated 
mutually exclusive classes or categories. 


Ж variable whether countable or measurable, is generally denoted by some symbol such as X or 
X, or Y, represents that ith or jth value of the variable. The subscript / ог / is replaced by a 


such as 1,2,3,... when referred to a particular value. 
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Example 1.6 Identify each of the following as examples of (1) Attribute, (2) Discrete, 
(3) Continuous variables, 


a) the hair colour of children: 

b) length of time required for a wound to heal; 

с) the number of telephone calls arriving at a switch board рег 1 hour period; 
d) the breaking strength of a given type of a string; 

e) the number of questions answered correctly on а test; 
f) the number of stop signs in the city of Lahore: 

р) the colour of your eye; 

h) the number of children in your family. 


Solution 
а) Attribute 
b) Continuous IM 
c) Discrete + 
d) Continuous IN) 
e) Discrete ev 
f) Discrete | ! | 
g) Attribute i SS? 
h) Discrete % < 
1.2.3 Measurement Scales. By gh ent, we usually mean the assigning of numbers 
observations or objects and scaling is a prt "of measuring. The four scales of measurements are brit 
mentioned below: 


Nominal Scale. The classi od or grouping of the observations into mutually exclusr 
qualitative categories or classes 1d to constitute a nominal scale, For example, students are classi 
as male and female. Numbe ME may also be used to identify these two categories. Similarly, rai 
may be classified as һеау/%жойегате and light. We may use number 1, 2 and 3 to denote the three с 
of rainfall. The numbers when they are used only to identify the categories of the given scale, carry ne 
numerical significance and there is no particular order for the grouping. 


Ordinal or Ranking Scale. It includes the characteristic of a nominal scale and in addition has 
property of ordering or ranking of measurements. For example, the performance of students (or pla; 
is rated as excellent, good, fair or poor, etc. Number 1, 2, 3, 4, etc. are also used to indicate ranks. 
only relation that holds between any pair of categories is that of “greater than" (or more preferred). 


Interval Scale. A measurement scale possessing a constant interval size (distance) but not a true 
zero point, is called an interval scale. Temperature measured on eithef the Celcius or the Fahrenheit sci 
is an outstanding example of interval scale'because the same difference exists between 20°C (68°F) 
30°C (86°F) as between 5°C (41°F) and 15°C (59°F). It cannot be said that a temperature of 40 degrees Б 
twice as hot as a temperature of 20 degree, i.e. the ratio 40/20 has no meaning. The arithmetic operati 
of addition, subtraction, etc. are meaningful. 
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Ratio Seale. It is a special kind of an interval scale where the scale of measurement has a true zero 
point as its origin. The ratio scale is used to measure weight, volume, length, distance, money, etc. The 
key to differentiating interval and ratio scale is that the zero point is meaningful for ratio scale. 


Example of Measurement Scales 


Nominal-level data Ordinal-level data Internal-level data — Ratio-level 

Gender (Male, Female) Grades (A, B, C, D, F) Temperature Age 1 

"Бус colour .  Posiion(I",2" 3" ett) 10 score Weight 

) Religion Ranking of cricket player SAT score Height 
і Specialization Rating Time 
(poor, good, excellent) 
L Nationality Socio-economic status Salary 
(poor, middle class, rich) % 
Distance 


1.2.4 Errors of Measurement. Experience has shown that a ñuous variable can never be 
measured with perfect fineness because of certain habits and methods of measurements, 
"estrumerits used, etc. The measurements are thus always recorde: ct to the nearest units and hence 
ше of limited accuracy. The actual or true values аге, howexey assumed to exist. For example. if a 
smdent's weight is recorded as 60 kg (correct to the nearest kå n), his true weight in fact lies between. - 
595 kg and 60.5 kg, whereas a weight recorded аз means the true weight is known to lie 


between 59.995 and 60:005 kg. rie: small i may be; between the” 
measured yalue and the true value. 7 € y " 7 


т? x and х+є respectively, then the differ} x-£)-x,ie.& xo This error involves the 
xt of measurement of х reri om an absolute error. An absolute error divided by the true 


salue is called the relative error. T the relative error= 


‚ Which when multiplied by 100, is 
X+E 


. These errors WV ndent of the units of measurement of X . It ought to be noted that 
um error has both magnitude direction and that the word error in statistics does not mean mistake 
which is a chance inaccuracy. 


is salto be biased when the е” ae is consistently and constantly higher or lower 
Š the true value, Biased errors arise from the personal limitations of the observer, the imperfection in 
Же instruments used or some other conditions which control the measurements. These errors аге not 
sealed by repeating the measurements. They аге cumulative in nature, that is, the greater the number of 
nts, the greater would be the magnitude of error. They are thus more troublesome. These 

are also called cumulative ox systematic errors. 


An error, on the other hand, is said to be unbiased when the deviations, i.e. the excesses and 

‚ from the true value tend to occur equally often. Unbiased errors are revealed when measurements 
repeated and they tend to cancel out in the long run. These errors are therefore compensating and are 
known as random errors or accidental errors. 


А measurement free from all classes of errors is considered as an accurate measurement. This is 
efforts are made to reduce the magnitude of errors to a minimum so that the level of accuracy at 


Б 
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which the measurements are recorded, is increased. To achieve this end, a clear understanding-of the 
meaning of significant digits and the process of rounding off the numbers is very important in statistical 
computations. 


digits in a number, are those that represent accurate and meaningful information. For instance, the | 
number 35 representing a continuous variable has two significant digits. In recorded measurements, all 
digits except zeros are always significant. For zeros, we may state as: 


i) 


ii) 


iv) 


i) 


1,2,5 Significant Digits. Accuracy in measurements is related to significant digits. The significant 


It should be remembered that 


a) 


b) 


c) 
d) 


e) 


1.2.6 Rounding off a Number. The process of rounding off о. simply rounding a number 
Шш! а certain number of digits counted from the left, are to be retained and the last few digits are to b 

н dropped in a decimal number or (ii) replaced with zeros in a whole number The rules generally 
^ rounding decimal numbers are as follows: 


Zeros are significant if they follow а decimal point and conclude a number, e.g. the 
measurement 2.500 has four significant digits. 


Zeros are non-significant when they follow a decimal point but commence a number, e.g. the 
measurements .04 and .000237 contain only 1 and 3 significant digits respectively, 


Zeros may or may not be significant when they lie entirely to the left of the decimal point, 
where ihey may not represent measurement but may be used to simply locate the decimal 
point. In such a case, a definite gpecification such as standard notation, becomes necessary 

When any number is expressed as a product of a power of 1 a number between 1 and 10, 
it is said to be written in standard notation. For the number 75400 can have 2 
significant digits when written in standard notation asp $4 x 10*. It can also have 5 significant: 
digits if written as 7.5400 x 10*. 


Zeros are always significant when occur withi ies of significant digits, e.g. the numbers 
20.3, .1001, 4.00507. etc., lave 3, 4 and 6 digits respectively. 


significant digits in a number еже by the location of the decimal point, e.g 
measurements recorded as BC ‚ 269 or .000269 have only 3 significant digits; 


in case of discrete data De generated by the process of counting, the number af 
significant digits is consi indefinite because the level of accuracy cannot be improved, 
e.g. the number 157 "indefinite significant digits; 


the rules i determination of the number of significant digits, are applicable to 
continuous variablës; 


in the operations of addition and subtraction, all digit positions which are not significant 
any of the values being added or subtracted, are not significant in the total or difference: 


in the operations of multiplication and division, the num — o! significant digits in the result 
determined by the value with the smallest number of «iyo ificant digits that enters into 
calculations. 


The last significant digit is increased by 1, if the first digit of the remainder to be dropped 
more than 5 or 5 followed by digits not all of which are zero, e.g. the numbers 2.145001 
5.3772 are rounded off (o three significant digits as 2.15 and 5.38 respectively 


man 
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ii) ^ The last significant digit remains unaltered, if the first digit of the remainder to be dropped is 4 
j ог less, e.g. the mimbers 2.1548 and 7.3627 are rounded off to three significant digits as 2.15 
and 7.36 respectively. 
iii) When the digit to be dropped:is exactly 5, the accepted practice is to increase ће last 
significant digit by 1, if it is odd and to leave unaltered if it is even e.g. the number 4.535 and 
2.745 are rounded off to three significant digits as 4.54 and 2.74 respectively. 
For rounding whole numbers, we can change the word “the first digit to be dropped” to "the first 
digit to be replaced by zero” in the rules stated above. 
The point to be made here is the rules for identifying significant digits and the process of rounding 
Фе numbers should be applied to final calculations and not to the intermediate results 


13 COLLECTION OF DATA 


The most important part of statistical work is perhaps the collecüon of data. Statistica) data are 
collected either by a complete enumeration of the whole field, called census, which in many cases would 
5e too costly and too time consuming às it requires large number of enumergtors and supervisory staff, or 
ту a partial enumeration associated with a sample which saves muc and money. The sampling 
methods explained at length in later chapters, are increasingly e both in official and in private 
Squires to collect data. e 

When data are classified according to source, it is ай to make the following distinction. 


Data that have been originally collected (raw ve not undergone any sort of statistical 
zestment, are called Primary data, while data that 9 undergone any sort of treatment by statistical 
methods at least once, ѓе. the data have been coll classified, tabulated or presented in some form for 


a certain purpose, are called Secondary data. 
The survey research includes the folge important steps , 
a) о define the objectives of еу, 
b) to define the variable(s the population of interest. 
с) to define the dai and data measuring schemes. 
d) to determine the appropriate descriptive and inferential data analysis techniques. 
A brief description of the methods generally adopted either on census basis or on sample basis for 
eellecting data, is given below. 
1.3.1 Collection of Primary Data. One or more of the following methods are employed to collect 
сагу data: 
Servey Research 


i) Direct Personal Investigation, In this method, an investigator collects the information 
personally from the individual concerned. Since he interviews the informants himself, the 
information collected is generally considered quite accurate and complete. This method may 
prove very costly and time consuming when the area to be covered is vast. However it is 
useful for laboratory experimerits or localized inquires, Errors are likely to enter the results 
due to personal bias of the investigator. 
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ii) Indirect Investigation or Personal Interviews. Sometimes the direct sources do not exist or 
the informants hesitate to respond for some reasons or other. In such a case, third parties or 
‘witnesses having information are interviewed. As some of the informants are likely to 
deliberately give wrong information, so the reliance 15 not placed on the evidence of one 
witness only, Moreover, due allowance is to be made for the personal bias. This method is 
useful when the information desired is complex or there is reluctance or indifference on the 
рап of the informants. It can be adopted for extensive inquires. 


iii) ^ Collection through Questionnaires. A questionnaire is an inquiry form comprising of a 
number of pertinent questions with space for entering information asked. The questionnaires 
are usually sent by mail and the informants are requested to return the questionnaires to the 
investigator, after doing the needful within a certain period. This method is cheap, fairly 
expeditious and good. for extensive inquires. But the difficulty is that the majority of 
respondents (persons who are required to answer the questions) does not care to fil! the 
quesuonnaires іп and to retum them to the investigators. Sometimes, the questionnaires аге 
returned incomplete and full of ergors. Іп врИе of these drawbacks, the method 15 considered 

"аз the standard method for routine business and administraygeg inquires. The answers to the 
questionnaires are very often recorded by trained enu to overcome the difficulties 
these days. It is important to note that the questions s! be few, brief, very simple, easy for 
all respondents to answer, clearly worded and not о туе to certain respondents. 


iv) Collection through Enumerators. Under thod, the information is gathered by 
employing trained enumerators who assi: informants in making the entries in the 
‚ schedules or questionnaires correctly, A hethod gives the most reliable information if the 
enumerator іѕ well trained. езен tactful. [t is considered the best method when a 
large scale governmental inqui be conducted. This method cannot be adopted by 
private individual or institution cost would be prohibitive to them. . 


v) Collection through 
the agents or local 
using their own j 
expeditious, but gi 


In this method, there is no formal collection of data but 
are directed to collect and to send the required informati 
f as to the best way of obtaining it. This method is cheap 
y the estimates. , 


vi Computer interviews, Respondents enter data directly into a computer in response 
questions presented on the monitor. 


Experimental Research 
i) Laboratory experiments, Manipulation of the independent variable(s) in an 
situation. Basic designs consider the impact of only one independent variable. 
ii) Field experiments. Manipulation of the independent variable(s) in a natural situation. 
1.3.2 Collection of Secondary Data. The secondary data may be obtained from the fo 
sources: 
Secondary Research 


Internal Secondary Data. Data generated within the organization itself, such as sal 
reports, sales invoices, accounting records, 
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i) Official, e.g. the publications of the Statistical Division, Ministry of Finance, the Federal and 
Provincial Bureaus of Statistics, Ministries of Food, Agriculture, Industry, Labour, etc. 


i) ^ Semi-Official, e.g. State Bank of Pakistan, Railway Board, Central Cotton Committee, Boards 
of Economic Inquiry, District Councils, Municipalities, etc. 
ii) Publications of Trade Association, Chambers of Commerce, etc. 
iv) Technical and Trade journals and newspapers. 
v) Research organisation such as universities, and other institutions, 


13.3 Editing of Data. The primary data should be intensively checked at an early stage in order to 
locate incomplete or inconsistent entries. If possible, the incomplete and defective questionnaires should 
be returned to the respondents for amendments. In order to accept the secondary data as authoritative, one 
should critically examine the reliability of the compiler and the suitability of the data. The scope and 
ebject of the inquiry, sources of infotmation and the degree of accuracy should also be carefully 
scrutinized. - : 

1.3.4 Uses and Misuses of Statistics. Statistics has numerous eR. It is difficult to find a field in 
which Statistics is not used. Statistics plays integral part in manysNisciplines, viz: Economics, Health. 
Planning, Astronomy, Management, Business, Psychology, Agri , Sociology, Education etc. 

A few examples of how and Where Statistics is BE ег: 

data, it must be collected and analyzed. 


ii) Та Government, many types of statisti are collected all the time. This data can be used 
for various types of planning and inform the general public. 
Wi) In education, Statistics are used eate фе results and standards of education. 

Statistical techniques are many шы misused: to sell products that don't work; to prove something 
ut is not really true, to get the а n of public by evoking fear and shock etc. There are two sayings 
Sout Statistics which explains s of Statistics. 

а) Statistics can prove ahything. 

b) "There аге three types of lies — lies, damned lies, and Statistics’ 

Statistics can also be misused in many ways such as using Not Representative Samples, Small 
Size, Ambiguous Averages and dispersions, Detached facts, Implied Connections, Wrong and 


ing Graphs, Wrong use of Statistical techniques, Serious violation of assumptions behind the 
al techniques and Faculty Surveys etc. 


i)  Inexperimental science, the experiments 


EXERCISES 


CTIVE 


‘True’ and *False'. If the statement is not true then replace the underlined words with words that 
the statement true: 


12 
i) АШ numerical data are not Statistics. 
ii) А Statistic is a summary measure computed for a population. 
iii) A Parameter isa summary measure computed for a sample. 
iv) Descriptive Statistics are used to make projections or estimates about the population. 
v) Inferential Statistics is the study and description of data. а 
vi) А sample is typically a very large collection of individuals or objects of ош interest. 
vii) The thickness of the glass is an example of attribute data. 
viii) The number of students in a class is an example of continuous data. 
ix) The make of a car is an example of discrete data. Д 
х) Тһе main objective of Statistics 15 to collect a sample, analyze it and make inferences а 
the unknown characteristics of the population from which ame has been drawn. 
SUBJECTIVE ` „© 
11. Explain what is meant by Statistics. Give the треп t uses and limitations of statistics. 
12 Define Statistics. Discuss, giving exampli (5 баропалсе of the study of statistics and sho 
how it can help the extension of scienti wledge. (P.U., B.A. (Hons.), 1960) 
13 a) What do you understand by the teil Statistics? Give its chief characteristics. 
b) Give a brief account of the i of statistics in different fields. 
К; (P.U., B.A/B.Sc. 1971) 
L4 Comment upon the follo statement of Sir Ronald Fisher: "Statistics may be regarded 
(i) the study of populati ; (ii) the study of variation and (iii) the study of the methods 
reduction of data”. ку: ' (Р.0., В.А. (Hons.), Part-I, 1964) 
L5 | Comment on 8 жетелі given below: 
“Statistics is concerned with understanding the real world through the information that 
derive from classification and measurement. Its distinctive characteristic is that it deals wi 
variability and uncertainty which is everywhere”. 
1.6 а) Define Statistics and explain its characteristics. 
b) What are the uses of Statistics? (P.U., B.A. (Hons. іп Econ.), | 
1.7 Explain the difference between the following? 


https://stat9943.blogspot.com 


INTRODUCTION TO STATISTICAL THEORY 


a) Statistics and Statistic. 

b) Population and Sample. 

c) Descriptive and Inferential Statistics. 
d) Quantitative and Qualitative Variables. 
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€) Discrete and Continuous Variables. 

f) Biased and Unbiased errors. 

g) Primary data and Secondary data. 

h) Nominal and ordinal scale, Interval and ratio scale. 


18 What is a statistical error? In what way does it differ from a mistake? Explain the difference 
between absolute and relative errors. 


19 а) Define a Variable. Differentiate between a discrete and a continuous variable. 
b) Classify the following variables as discrete or continuous: 
i) The number of students attending a class. 
ii) The amount of milk produced by a cow. 
iii) The number of heads irt the toss of 6 coins. 


iv) The yearly income of a College Professor. 5% 
v) The age of a shopkeeper. қ” eS 
vi) The weight of a college student. S 


vii) The number of petals on a flower. qu 

viii) The life times of television tubes * by a company. 
іх). Temperature recorded every Nt at a weather bureau. 
x) € Bos market. 


1.10 а sciam 
iy Colour of eyes. 
ii) Number Pod sold in the last month. 
iiij Marital status of faculty members. 
iv) Student's weight. 
v) Lifetime of car batteries. 
vi) Number of burgers sold by a fast food shop. 
vii) Brand of cars. 
1.11 Classify each as nominal-level, ordinal-level, internal-level or ratio-level measurement. 
i) Weights of cars 
ii) Rankings of squash players. 
ii) Temperature of the city. 
iv) Salaries of the top five executives in bank. 
v) Mz, ital status. 


14 


112 


113 


1.14 


115 


1.16 
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vi) Ages of students. 
vii) Ratings of five players (Poor, Fair, Good, Excellent) 
viii) IQ of student. " 

ix) Rating of movies 

X) Weights of suitcases on plane. 


Round off the following continuous data to four significant digits each. 
(i) 32.21705, (ii) 937.05002, (iii) 0.003599499, 
(iv) 1.003599499, (v) 0.07000455, (vi) 22.2500001. 


а) Distinguish between Primary and Secondary data, giving examples of each. 


b) Describe the methods which can be used in the collection of statistical data, stating the 
advantages and disadvantages of each method. 


c) Enumerate the main sources of errors in Statistics and give their effects. 
ы (P.U., B.A./B.Sc., 1982-5) 


What methods would you employ in the collection of зана ы! data when the field of inquiry 
is (i) small, (ii) fairly large and (iii) very large, if to pay due regard to accuracy, 
labour, cost and time? ко 


S 
What are the different methods employed in ection of data for statistical enquiries? In 
what type of inquiry should each one of D (P.U., В.А./В.5с., 1961) 


Select a newspaper or a magazine PRAN involves a statistical study and answer the 
following questions. 


а) Is this study descriptive or i ial? Explain your answer. 

b) What variables are used į study? 

c) Whatlevel of РЕЧИ was used to obtain data? 

d) Is population de in the article? If not, how could it be defined? 
e) How the да: ight have been collected? 

f) Do you agree with the conclusions given in the article? 


%.%%....%. 
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The device of gathering data often results in a massive volume of statistical data, which are in the 
= of individual measurements or counts. It is difficult to learn anything by examining the unorganised 
which is more often confusing than clarifying. The mass of data is therefore to be organised and 
sed into a form that can be more rapidly and easily understood and interpreted. For this purpose, 
Sniques of classification. tabulation and graphic displays are presented in this chapter. — ' 


2 CLASSIFICATION 


олло E or арты ае same or еі in other “Чо ог 
». Classification is thus the sorting of data into casa classes or groups according tó their 
E slike or not When the даа аге sorted according to one criterion only, it is called a simple 
ification or a one-way classification. Classification is called a two-way classification when tlie data 
serted according to two criteria. A manifold classification or cross-classification is made ассогйіпр to 
sal criteria, 


Data may also be classifi ed according to qualitative, temporal angybeographical characteristic. 

it of data accordir : values oí i risti distribution. When the 
variable is expressed. in terms of 1 
al arrangement of values is referred to as a fime series. 


2.2.1 Aims of Classification. The main aims of classi are: 

to reduce the large sets of data to an easily ung Sood summary; 
to display the points of similarity and di ty; 
essary details, 


to save mental strain by eliminating 


+ to reflect the important aspects data: and 
to prepare the ground for c: nson and inference. 


2.2.2 Basic Principles of cation. While classifying large sets of data, the following points 
be taken into consideration. 


The classes or categories into Which the data are to be divided, should be mutually exclusive 
and no overlap should exist between successive classes. In other words, classes should be 
arranged so that each observation or object can be placed in one and only one class. 


The classes or categories should be all inclusive. All inclusive classes are classes that include 
all the data. 


As far as possible, the conventional classification procedure should be adopted, 
The classification procedure should not be so elaborate as to lead to trivial classes nor ii 
should be so crude as to concentrate all the data in one or two classes. 

BULATION 


By tabulation, we mean a systematic presentation of data classified under suitable heads and 
is, and placed in columns and rows. This sort of logical arrangement makes the data easy to 


15 


https://stat9943. blogspot.com. асе 


understand, facilitates comparisons and provides ап effective way to convey information to a 
British statistician, Professor Bowley (1869-1957), refers to tabulation as “the intermediate 
between the accumulation of data, in whatever form they are obtained, and the final reasoned 
the results shown by 


2.31 Types of Tables. Statistical tables classified according to purpose, are of two 
General purpose (primary) tables and Specific purpose (derived or text) tables, The general purpose 
are large in size, are extensive with vast coverage and are constructed for reference purposes. The 
purpose tables are simpler in structure and deal with one or two criteria of classification only. Such 
are used to analyse or to assist in analysing data. 


When the classification corresponds to one, two or many criteria or characteristics, the 
called a single, double or manifold tabulation respectively. Tabulation of a dependent variable 
number of students) against the independent variable (say, weight) provides an example of a 
tabulation. Tables with two criteria of classification, e.g. gender and marital status or height and 
etc. are examples of double tabulation. An example of manifold tabulation is the presentation @ 
population of a country by age, by gender, by residence, by literacy, by livelihood classes, etc. 


The main parts of a statistical table are the title, the boxhead, the stub, the body, опе or 
. prefatory notes, footnotes and a source, etc. They are described in ех section. 


2.3.2 Main Parts of a Table and its Construction. пе parts of a table and the general 
to be observed in constructing any table are described below 


a) A table must have a self-explanatory tj ich should usually tell us the “what, 
how classifie and when" of the data, i in that order. ther important points are stated below- 
i) Titles should be brief in the form оће. Complete sentences are unnecessary. 
ii) ^ Abbreviations should not be us 


iii) Main titles should be in capi Vos Sub-titles; if any, should be in lower case 
with major words capitali d should indicate clearly what the table describes. 


iv) The different parts ога should be separated by commas but no full-stop at the end, 

v) Words in titles shoul, fot be hyphenated except when really necessary. 

vi) Ifa title necessi the use of two or more lines, an inverted pyramid arrangement of: 
lines should BQysed. 


b) Column Captions and. The heading of each column is called a Column Ci 
while the section of a table that contains the column captions, is referred to as Boxhead. Points to 
here are given below: 

i) The heading should be clear but concise. 


ii) _ They should be arranged in such a way that the most important characteristic is placed іп 
first column. The column of totals is usually placed at extreme right, but sorne people 
the totals on the left. 


ii) Ошу the first word in each column caption should be capitalized. No full-stop should be 
the end. 


iv) X Abbreviations, when clear, may be used. 
v) Main caption should be centred over the column it is to span. 
vi) Extra lines should be used to avoid crowding in caption box. 


https://stat9943.blogspot.com 


ATION OF DATA 17 
ті) Whenever possible, caption width should be made roughly proportional to the size of numbers 
to be inserted. 


c) Row Captions and Stub. The heading or title for a row, is called the Row Caption and the 
containing the row captions is known as Stub. The necessary points in this respect are given 


The principles for column captions apply to row captions in stub. 

If the stub is long and has several levels of classification, the major classification should be 

capitalized to separate the table into parts, 

Whenever the figures have more than four or five significant digits, the digits should be 

grouped in threes or fours. For example, one should write 23 178 327. not 23178327. 

In long tables, some space should be left after every five or ten rows. 

Totals should usually be placed at the bottom, but some prefer to place them at the top. 

Items in the stub should be arranged so as to facilitate easy reading. 

Every stub should have an'appropriate heading describing its contents. This heading should be 
centred in the upper left box of the table. 


d) Prefatory Notes and Footnotes. Explanatory notes incorp in the table beneath the title 
below the body, are called prefatory notes and footnotes respective 


Prefatory notes give additional specifications of the data i tive of items included or excluded 
data of the table, statements of the box, etc. They are p n the title and the boxhead. The 
should be in lower case alphabet. Footnotes are clarify anything in the table by giving a 
description, by drawing attention to incompl or by stating any special circumstances 
the data. The footnotes should be ѕресібсміта(иге. They are placed immediately below the 
line of the table, above the source. Foo! 15 should be placed as follows: 


t of columns, place them at the end of the appropriate 


If they refer to an entire column 
caption. e 


If they refer to an entire y a set of rows, place them at the end of the appropriate stub 
title. 


If they refer to a ча in the table, place beside the cell entry in the body of the table. 


The footnotes should be indicated either by lower case alphabet enclosed in parentheses or by 
as *, f. }, etc.; never by a number. 


*) Source Notes. Every table should have a source note, unless the table is an original tabulation 
= source is clear from the context. It is placed immediately below the table and below the footnotes, 
+. The source notes must include the compiling agency, publication, date of publication and page as 
are used as a means of verification and reference. 


A Body and Arrangement of Data. The body of a table is the most important part, which 
the entire data arranged in columns and rows. A rough-sketch enables us to һауе an idea about 
of columns and rows required. 


Arrangement of the data is made by taking into consideration the basis of classification and the 
of the table. Thus the data may be arranged either (i) according to the alphabetical order or 
according to the time of occurrence or (iii) according to location or (iv) according to magnitude or 
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importance, or (v) by a customary classification, e.g. classifying as men, women and children, 
Whatever arrangements are used, the table should be neat, simple and attractive to the eye, 

g) A proper and judicious use of spacing and ruling enhances 
effectiveness of a table and 1 helps in separating or emphasizing certain items in it. Thick or double li 
(rulings) are used for emphasis and for separating the title, the boxhead, the stub, etc., while parts un 
captions and related columns are separated by thin or single liens. 


h) General. There are some other considerations too, that are enumerated below: 
i) A table should be simple. A complex table if possible. may be broken into relatively si 


tables. 

ii) Units of measurements and nature of the data should be specified in title, captions, etc. 
parentheses. 

iii) Percentages should be clearly indicated as ‘per cent of total" etc. and their total should 
shown as 100.0. 

iv) If the figures entered in the table are rounded off, this should be indicated in the prefatory 
or in the stub or caption ‘ 

у) Zeros need not be entered. sS 


уі). Minus signs are a part of the table and precede the 
vii) Тһе relationship of the parts to the whole should { wn by thin or heavy rulings. 
viii) Тһе item or items to be emphasized should ced in the most prominent position of 
table. М AS 


The general sketch of a table is given below: у" 


Асық күзүн а Aet 
— . M efatory notes 


Footnotes ..... e 
Source notes... 2 


Example 2.1 А district is divided into two areas, viz. Urban area and Rural area, Total 


g of the district is 271,076 out of which only 46,740 live їп the urban area. Total male popula 
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district is 139,699 and that of urban area is 23,083. Total unmarried population of the district is 112,352 
out of which 36,864 are rural females. In the urban area, unmarried people number 21,072 out of which 
12,149 are males. 


Prepare a table showing the population of the district by marital status, by residence and by gender. 
A rough table which will probably need amending later, might look as follows: 


LsomGmwps | — war | үкммЕ | 
DX c TCR ID T] 
opem [emen ener em mee 

же CS TEN a CHETEN ҮӨР rns 


We first compute the relevant figures as below: 


Rural population * Total population — Urban population 
- 2т ‚076 - 46,740 = 224,336 
Female population - ‘Total population— Male populatiqi 


= 271,076 — 139,699 = 131,3 7e 
Rural male population = District male 
= 139,699 — 23,083 = 6 
Similarly, Urban females = 46,740 — 23,0 ‚657 
Rural females = 224,336 - Miis = 107,720 
Married population of District = 271 «gy 12,352 = 158,724 
Rural unmarried population = 1 ee — 21,072 = 91,280 
Rural unmarried males 7280 - 36,864 = 54,416 
Urban unmarried females E 21,072 — 12,149 = 8,923 etc. 
Saving computed all these fights, they are presented in the final table that appears below: 
Tule: 


POPULATION OF DISTRICT “A” BY GENDER, 
MARITAL STATUS AND RESIDENCE 


[те [тул [тали [изэ [тэвэ [эли [зв [эз Газа ИЕТІН 
ЕЕ | aion | mae [эз [use | meer [nem | to | 
Баа авизо |е | oao | поне ало | sime отла | osse NETTE 
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2.4 FREQUENCY DISTRIBUTION 
The organization of a set of data in a table showing the distribution of the data into classes or 


groups together with the number of observations in each class or group is called a Frequen 
Distribution. The number of observations falling in a particular class is referred to as the class frequency 
or simply frequenc i by f. Data presented in the form of a frequency distribution are also 
called grouped data while the data in the original form are referred to as ungrouped dara. The data are 
said to be arranged in an array when аттапредй m ascending or descending order of magnitude. The 
purpose of a frequency distribution js to produce a meaningful pattern for the overall distribution of the 
data from which conclusions can be drawn. A fairly common frequency pattern is the rising to a peak and 
then declining. In terms of its constmction, each class or group has lower and upper limits, lower and 
upper boundaries, an interval and a middle value, 


2.41 Class-limits. The c/ass-/imits are defined as the numbers or the values of the variables which 
describe the classes; the smaller number is the lower class limit and the larger number is the upper class 
limit, Class-limits should be well defined and there should be no overlapping. In other words, the limits 
should be inclusive, i.e. the values corresponding exactly to the lower limit or the upper limit.be included 
in that class. The class-limits are therefore selected in such a way that they have the same number of 
significant places as the recorded values. Suppose the data are тесогйе@в the nearest integers. Then ап 
appropriate method for defining the class limits without overlappif®,” for example, may be 10 — 14, 
15 — 19, 20— 24, etc. The class limits may be defined as 10.0 — м! ”15.0- 19.9, 20.0 — 24.9, etc. when 
the data are recorded to nearest tenth of an integer. Sometimes class has either no lower class limit or 
no upper class-limit. Such a class is called an G open-end classes, if possible, should һе 
avoided as they are a hindrance in’ performing TENS lations. A class indicated as-10 — 15 will 
include 10 but not 15, ѓе. 10€ X < 15. 


2.4.2 Class-boundaries. The class-bounciggos are the precise numbers which separate one class 
from another. The selection of these numbers, ves the difficulty, if any, in knowing the class to which 
a particular value should be assigned. A claxs-boundary is located midway between the upper limit of a 
class and the lower limit of the next highog@tiss, e.g. 9.5 — 14.5, 14.5 — 19.5, 19.5 — 24:5, or 9.95 — 14.95, 
14.95 — 19.95, etc. The class-boun are thus always defined more precisely than the level of 
measurements being used so that di. ossibility of any observation falling exactly on the boundary is 
avoided. That is why the class,M@tindaries carry one more decimal place than the class limits or the 
observed values, The upper $ undary of a class coincides with the lower boundary of the next class. 


2.4.3 Class Mark. A class mark, also called class midpoint, is that number which divides each 
class into two parts. In practice, it is obtained by dividing either the sum of the lower and upper limits of a 
class, or the sum of the lower and upper boundaries of the class by 2 but in a few cases, it does not hold. 
particularly in modern practicé of аре grouping. For purposes of calculations, the frequency in a 
particular class is assumed to have the same value as the class-mark or midpoint. This assumption may 
introduce an error, called the grouping error, but statistical experience has shown that such errors usually 
tend to counterbalance over the entire distribution. The grouping error may also be minimized by 
selecting а class (group) in such a way that its midpoint corresponds to the mean of the observed values 
falling in that class. 


2.4.4 Class Width or Interval. The class-width or interval of a class is equal ta the difference 
between the class boundaries. It may also be obtained by finding the difference either between two 
successive lower class limits, or between two successive class marks. The lower limit of a class should 
not be subtracted from its upper limit to get the class interval. An equal class interval, usually denoted by 
h or c, facilitates the calculations of statistical constants such as the mean, the standard deviation, 
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moments, etc. That is why in practice, it is desirable to have equal class-intervals. But in some types of 
»eonomic and medical data, it is wise to use unequal class-intervals on account of greater concentration of 
measurements in certain classes, Such class intervals usually become uniform when logarithms of class 
ыі are taken. |t should be noted that some people use the terms "class" and "class-interval" 
erchangeably and the width of the class is referred to as the size or length of the class-interval. 


2.4.5 Constructing a Grouped Frequency Distribution. The following are some basic rules that 
quld be kept in mind when constructing a grouped frequency distribution: 


i) Decide on the number of classes into which the data are to be grouped. There are no hard 
and fast rules for deciding on the number of classes which actually depends on the size of 
data. Statistical experience tells us that no less than 5 and no more than 20 classes are 
generally used. Use of too many classes will defeat the purpose of condensation and too few 
will result m too :nuch loss of information. Н.А. Sturges has proposed an empirical rule for 
determining the number of classes into which a set of observations should be grouped. The 
rule is 
k=1+3.3 log М, 


where k denotes the number of classes and А is the total Som ра trae For example, 
if there аге 100 observations. then by applying Sturges” rule ae uldhave ” 


k1-3,3 (2.0000) — 7.6, іе. 8 classes ev 


Thus eight classes are required but this rule is re practice. 


Determine the range of variation in the da difference between the largest and the 


smallest values in the data. by 


Divide the range of variation by the r of classes to determine the approximate width 
ог size of the equal class-interval. In f fractional results, the next higher whole number is 
usually taken as the size or width. - interval. If equal class-intervals are inconvenient or 
may be undesirable, then с unequal size are used. But in practice, intervals that are 
multiple of 5 or 10, are co! y used as people can understand them more readily. 


e 

Decide where to locate Selass-limit of the lowest class and then the lower class boundary. 
The lowest class with the smallest data value or a number less than it. It is bett 
if it is a multiple of cMss-interval. Find the upper class boundary by adding the width of the 
elass-interval to the lower class-boundary and write down the upper class limits too. The 
open-end classes, іе. classes with the lowermost or uppermost class boundary unknown, 
should be avoided if possible. 


Determine the remaining class-limits and class boundaries by adding the class-interval 
repeatedly. The lowest class should be placed at the top and the rest should follow according 
to size. In some cases, the highest class is placed at the top. 


Distribute the data into the appropriate classes. This is best done by using a "Tally- 
Column" where values are tabulated against appropriate classes by merely making short bars 
or tally marks to represent them. It is customary for convenience in counting to place the first 
four bars vertically and the fifth one diagonally and to leave a space. The number of tallies is 
then written in the frequency column. The tally column is usually omitted in the final 
presentation of the frequency distribution. But in case of small number of values, the actual 
values should-be shown against each class to mitigate chances of error: 
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vii) Finally, total the frequency column to see that all the data have been accounted for. 


These rules are applied to group raw data which are assumed to be continuous. /n case of di 
data which carry only integral values, the concept of a class boundary is unrealistic as there сап be 
points where the adjoining classes meet. In spite of this logical difficulty, when the discrete data 
sufficiently large, they are treated for convenience of calculations as continuous and hence are gro 
in the same way as the continuous data. 


Example 2.2 Make a grouped frequency distribution from the following data, relating to 
weight recorded to the nearest grams of 60 apples picked out at random from a consignment. 
106: 107 46: 827 109) 2107 2115. 1793" 7187. 95 1238 (5 
111 92 86 70 126 68 130 129 139 119 115 128 
100 186 84 99 113 24 ІП 141 136 123 90 115 
98 110 78 185 162 178 140 152 173 146 158 194 
148, 90 107 181 ІЗІ, 75 184 104 110 80 118 82 


Ву scanning the data, we find that the largest weight is 204 and the smallest weight is 
grams so that the range is 204 — 68 — 136 grams, (ә) 

Suppose we decide to take 7 classes of equal size. Wize or width of the equal class іп 
would be 158-1047. But we take л = 20, the next i value higher than 19.47 to facilitate 
numerical work. Ў 

Let us decide to locate the lower limit lowest class at 65. With this choice, the class 1 
will be 65 — 84, 85 — 104, 105 — 124, ..., boundaries become 64.5 — 84,5, 84.5 - 104.5, 104 
124.5, ..., and the class-marks аге 74.5094.5, 114.5.... The grouped frequency distribution is 
constructed as follows: RS ; 


i) By listing the actual va 
FREQUEN£QNDISTRIBUTION OF WEIGHTS OF 60 APPLES 


65-84 | 76, 82, 70, 68, 84, 78, 75, 80, 82 
85-104 | 93,95,92, 86, 100, 99, 90, 98, 90, 104 


106, 107, 109, 107, 115, 123, 111, 119, 115, 113, 
111, 123, 115, 110, 107, 110, 118 


125-144 | 125, 126, 130, 129, 139, 128, 141, 136, 140, 131 
145-164 | 162, 152, 146, 158, 148 

165 ~ 184 | 178, 173, 181, 184 

185-204 | 187,186, 204, 185, 194 


105 ~ 124 
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This table is sometimes known as an entry fable. The values against each class may be arranged in 
an array. 


8) By using a Tally-Column: 
FREQUENCY DISTRIBUTION OF WEIGHTS OF 60 APPLES 


биз: put 
(ді boundaries n 


65 —84 64.5 — 84.5 


85 —104 84.5 — 104.5 


105 – 124 104.5 — 124.5 
125 — 144 124.5 — 144.5 
144.5 — 164.5 
164.5 — 184.5 
184.5 — 204.5 


Example 2.3 Given below аге the mean Й death rates per 1,000 at ages 20 65 in each of 88 
»eupational groups. Construct a grouped тұсу distribution 


7:5 82 62 89 es 547 093^ SS 109 108 74 Li 

97 116 126 ENS 02:092. pA OD 71:3. 73 84 12 
103 101 MAY 1.1 65 125 78 645 84. 193. 124 Ag 
104 91 Ч%У оз 62 103 66 74 SOE 74^ 94 23 

TI TRE 87 5.5 86 96 119 104 78 76 121 1% 


46 140 81 114 106 116 104 81 46 66 128 
68 71 66 88 88 107 108 60 79 73 93 
93 80 101 23/550 7697 790. 88. 7940 114» 109 
(B.LS.E. Lahore, 1971) 


А scan of the data shows that the largest value is 14.0 and the smallest value is 3.9 so that the range 
140 - 3.9 — 10.1. 


As the data are recorded to one decimal place, we may therefore locate the lower limit of the first 
geoup at 3.5. Let us choose a class interval of 1.0. Then the class limits are specified as 3.5 — 4.4, 
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4.5 —5.4, 5.5 — 64, ... With this choice, the class-boundaries аге 3.45 — 4.45, 4.45 - 545, 5.45 — 6.45, ., 
which do not coincide with the given values. 4 


The following table shows the required frequency distribution: 
FREQUENCY DISTRIBUTION OF MEAN DEATH RATES 


EISE INN ZEN | 


3.5-44 3.45445 
4.5-54 445-545 
55-64 5.45 — 645 
65-74 645-745 
75-84 745—845 
85-94 845 — 945 
95-104: 945-1045 
105-114 1045 — 1145 
11.5- 12.4 11.45 12.45 
125-134 12.45- 13.45 
13.5 - 144 1345 — 14.45 


Example 2.4 Construct a agents for the data below. Indicate the class boundaries 


and class limits clearly, 
БЕЛЕ ЕТІС 
Гоа [es [ ame | |а| 
БЕЛЕЛЕГІЕЛЛЕРДЕРІ 
ЕГИПЛІЕТЕІЛЕТГІЕІІ 
[em [ue [an ЕЗЕТ 


By scanning the data, we find that the largest value is 81.71 and the smallest value is 18.95 so 
the range is 81.71 — 18.95 = 62.76. 


Suppose we decide to take 5 classes of equal size. Then size or width of the equal class i 


would be AIS 1255, But we take h = 13.00, the пех! integral value higher than 12.55 to 


numerical work. 
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As the data are recorded to two decimal places, we may locate the lower limit of the first group at 
18.00. With this choice, the class limits will be 18.00 — 30.99, 31.00 — 43.99, ....... ‚ the class boundaries . 


become 17.995 — 30.995, 30.995 — 43.995. The grouped frequency distribution is then constructed as 
follows: 


Class boundaries 


18.00 — 30.99 17.995 — 30.995 
31.00 — 43.99 30.995 — 43.995 
44.00 — 56.99 43.995 — 56.995 


57.00 — 69.99 56.995 — 69.995 
70.00 — 82.99 69.995 — 82.995 


Example 2.5 А survey of 50 retail establishments had assistants, excluding proprietors, as 
follows: ` 
2422792 1056042 25 
3 


4 ‘ , s 
25726288 eE A Ey O n ah a HE Fae, Pee 35 
Ору ПЕРО ОЛИСИ X E 2; 
Arrange the values as a frequency distribution, ом 


By scanning the data, we find that the number of assists is a discrete variable and the range is 
small, so the data can be conveniently sorted by taking th es of classes as 0, 1, 2, etc. The frequency, 
distribution is then constructed as shown below: ху 


° 
FREQUENCY DISTRIBUTION OF А 55 т IN 50 RETAIL ESTABLISHMENTS 


Such a frequency distribution in which each class consists of a single value is sometimes called a 
е or ungrouped frequency distribution. 
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2.4.6 Cumulative Frequency Distribution. The total frequency of a variable from its one end to а 
certain value (usually upper class boundary in grouped data), called the base, is known as the cumulative 
frequency, less than or more than the base of the variable. A table that shows the cumulative frequencies, 
is called a cumulative frequency distribution. The cumulative frequency of the last class is the sum of all 
frequencies in the distribution. If the cumulation process is from the lowest value to the highest, it is 
referred to as “a less than” type cumulative frequency distribution. For example, let us consider а 
frequency distribution having k classes, each of width Л. Let us denote the midpoint of the ith class by x; 


k 
with frequency f; such that M! f, = п. Now the lower class-boundary of the first group is x, — л/2 and 
[m] 
the upper class boundaries аге x, +//2, (1 — 1, 2,...., К). The cumulative frequency distribution is then 
Obtained by adding each successive frequency to the cumulative total of frequencies for the preceding 
classes as shown below: 


less than x, —^/2 
less than x, 44/2 
less than x, +h/2 


less than x, +4/2 


less than x, + h/2' 


It should be noted that a /ess than type cugi) 
boundary of the first group indicating that they! no frequency below x, - 4/2 


When the frequencies are cumu m the highest value to the lowest value, it is called a "more 
than" type cumulative frequency. EN 


If the class frequencies st various ‘classes are divided by the total frequency, we get the 
relative frequencies which ae add to one, The class frequencies may also be expressed as 
percentages, the total of whi? would be 100. A percentage cumulative distribution is useful to read off 
the percentage of values falling between certain specified values. 


Example 2.6 Construct (i) a “less than" type cumulative distribution, and (ii) a “more than" type 
cumulative distribution from the frequency distribution of weights of 60 apples of Example 2.2. 
iy А “less than” type cumulative frequency distribution is shown below: 


Less than 64.5 
Less than 84.5 
Less than 104.5 


Less than 124.5 
Less than 144.5 
Less than 164.5 
Less than 184.5 
Less than 204.5 
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27 
ii) А “more than" type cumulative frequency distribution is given below: 


More than 64.5 
More than 84.5 
More than 104.5 
More than 124.5 
More than 144.5 
Моге than 164.5 
More than 184.5 
More than 204.5 


А clear disadvantage of using a frequency table is that the identity of individual observations is lost 
grouping process. To overcome this drawback, John Tukey (1977) іп ced a technique known as 
Srem-and-Leaf Display. This technique offers a quick and novel wa simultaneously sorting and 
ing data sets where cach number in the data set is divided in parts, a Stem and a Leaf. A 
і the leading digit(s) of each number and is used in sorting, “а leaf is the rest of the number or 
zailing digit(s) and shown in display. A vertical line separa © leaf (or leaves) from the stem. For 
le. the number 243 could be split two ways: 


x vi 

gossible stems are arranged in a smallest to the largest and placed on the left hand side 

Ene. D 

The stem-and-leaf display i step for listing the data in an array, leaves are associated with 

sem to know the numbers.<4> stem-and-leaf table provides a useful description of the data set and 

exsily be converted to а table. It is а common practice to arrange the trailing digits in each 

Som smallest to highest. t 

Example 2.7 The ages of 30 patients admitted to a certain hospital during a particular week were 
їз; 

48, 31, 54, 37, 18, 64, 61, 43, 40, 71, 51, 12, 52, 65, 53 

42, 39, 62, 74, 48, 29, 67, 30, 49, 68, 35, 57, 26, 27, 58 


Construct a stem-and-leaf display from the data and list the data in ап array. 


A scan of the data indicates that the observations range (in age) from 12 to 74. We use the first 
) digit as the stem and the second (or trailing) digit as the /eaf. The first observation is 48, 
& bas a stem of 4 and a leaf of 8, the second a stem of 3 and a leaf of 1, etc. Placing the leaves in the 
= which they appear in the data, we get the stem-and-leaf display as shown on next page: 
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Stem (leading digit) Leaf (trailing digit) 
82 
967 
17905 


830289 
412378 
415278 
14 


To get the array, we associate the leaves in order of size with the stems as shown below: 
12, 18, 26, 27, 29, 30, 31, 35, 37, 39, 40, 42, 43, 48, 48, 49, 
51, 52, 53, 54, 57, 58, 61, 62, 64, 65, 67, 68, 71, 74 


Example 2.8 Construct а stem-and-leaf display for the data of annual death rates given 
Example 2.3. 


Using the decimal part in each number as the /eaf and the i^ the digits as the stem, we get 
following stem-and-leaf display (leaves are ordered): 6, 
o 


225566689 
Ф 3334456778889 


1124667788899 
012333344467799 
011233446678899 
144669 

0145688 

0 


2.6 GRAPHICAL REPRESENTATION 


Tabulation, we know, is a good method of condensing and representing statistical data in a 
understandable form, but many people have no taste for figures. They would prefer a 
representation where figures could be avoided. This purpose is achieved by the presentation of 
data in a visual form. The visual display of statistical data in the form of points, lines, areas 
geometrical forms and symbols, is in the most general terms known as Graphical Rep 
Statistical data can be studied with this method without going through figures, presented in the 
tables. 
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Such visual representation can be divided into two main groups, graphs and diagrams to be 
described in the sections that follow. The basic difference between a graph and a diagram is that a graph 
$ a representation of data by a continuous curve, usually shown on a graph paper while a diagram ts any 
other one, two or three--dimensional form of visual representation. 


17 DIAGRAMS 


grammatic rep bes b spatial series and data split into different 
Whenever a сайр ОГ the same pum of йы. at different places is to be made, diagrams will be the 
dest way to do that. Diagrammatic representation has several advantages over tabular representation of 
Seures. Beautifully and neatly constructed diagrams аге more attractive пиш jle fi; . Diagrams, 
Белар a visual display, leave more effective and long lasting impression on 
wake unwieldy data intelligible at a glance. Comparison is made easier it diagrams. Diagrams have 
3e disadvantages too. Diagrams are less accurate than tables; cost money and time and the amount of 
"ormation conveyed is limited. However, = method of representation is excessively used in business 
administration. 


Different types of diagrams or charts commonly used for displaying statistical data are described 


Linear ос One seats, Diagrams: They consist of Sgfple Bars, Multiple Bars and 
Component Bar Неге the values are represented ony one dimension, generally the 


length of the bar. 


Areal or Үсен consi ectangles, Sub-divided Rectangles 
and Squares, the areas of which are EN values of the given quantities. This 


device is TER тр data жа moderál} large variations. 

i ms. Т) are in the form of Cubes and cylinders, whose 
they represent. These diagrams are used when the 
variation among the теа of the daj&jó be portrayed is so large that even the square roots of 
the values concerned fail to MUM variation appreciably. 

Pie-Diagrams. They are in the Xórm of Circles and Sectors. Here the areas of circles or sectors 
are in proportion to the узідбә һеу represent or compare. = 

Pictograms. They соп pictures or small symbolic figures representing the statistical 
data. A pictogram is WY effective way of visual comparisons. For example, we can compare 
the armed strength of various countries by drawing pictures of the number of soldiers, where 
each pictorial soldier may denote, say, 1,000 soldiers, In a similar way, the production of 
wheat can be compared by means of the pictures of wheat bags of a specified size. It is 
essential to repeat the pictures a number of times to represent the differences in magnitudes. 


diagrams, the following points should be kept in mind: 


An appropriate scale consistent with the size of paper available and the size of the data to be 
represented, should be chosen and indicated either at the side or at the bottom of the diagram 
This scale must start at zero. 


A diagram like a table, must have a title, which should be brief and self-explanatory. A key. 
Semote or source will also be necessary. 


А diagram should be shaded, coloured or cross-hatched to show the different parts, if any. 
Lettering should be shown horizontally. з 


volumes аге 
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2.7.1 Simple Bar Chart. A simple bar chart consists of horizontal or vertical bars оГ equal 
и) hogie cool Бе, ата AE жериңи, Ас e. bitis of сораса e Ve d 
dimensional, the of these bars have no significance but are taken to make the chart look a 
The space s the bars should not exceed the width of the bar and should not be less than 
its width. The bars should neither be exceedingly long and narrow nor short and broad. The vertical 
chart is an effective way for presenting a time series and qualitatively classified data whereas hori. 
bars are useful for geographical or spatial distributions. The data when do not relate to time, should 
arranged in ascending or descending order before charting. 


Example 2,9 Draw a simple bar diagram to represent the turnover of a company for 6 years. 


Years: 1980 1981 1982 1983 1984 1985 
Rupees): 38,000 45,000 48,000 52,500 . 55,000 58,000 


The bar chart is drawn below: 
Bar diagram showing the Turnover 


70,000 of a company for 5 years 


1980 1981 1982 1983 1984 1985 
Year 

ОҚагі. A multiple bar chart shows two or more characteristics corr 

( t variable іп the form of grouped bars, whose lengths are proportional 

values of the chatactefistics, and each of which is shaded or coloured differently to aid identi 

d d Tor the comparison of two or three kinds of information. For example, i 

hs of a country can be compared from year to year by grouping the three 


тесно KO EE ea ол odora Fco 


А 


e (Source: Statistical Wing, Agriculture Deptt. Lahore) 
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The multiple bar charts аге drawn below: 
AREA AND PRODUCTION ОҒ 
COTTON ІН THE PUNJAB 
KEY 
г: arerin acres 


ГІ prowuction ім sares 


$ 


— 


2 YEAR x: 
2.7.3 Component Bar Chart. A coniponent bar chart is an effective technique іп which each bar 
sded into two or more sections, proportional in size to the compo: of a total being displayed 
bar. The various component parts shown as sections of the bar, ae shaded or coloured differently 
the overall effectiveness of the diagram. Component charts are used to represent the 


of the vzrious components of datà and the percentages) > ‘are also known as sub-divided 
АС 


; М; 
‘The appropriate —— è after arranging the population figures in ascending order is 


COMPONENT UAR CHART SHOWING 
POPULATICA OF 4 DIVISIONS 
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2.7.4 Rectangles and Sub-divided Rectangles. The area of a rectangle is equal to the 
its length and breadth. To represent a quantity by a rectangle, both length and breadth of the 
used. Sub-divided rectangles are drawn for the data where the quantities along with their со 
to be compared. These diagrams are generally drawn to compare the budgets of various families 
construction of sub-divided rectangles, we are required to 

i) change each component into the percentage of the:corresponding total, 


ii) draw one rectangle for each total, taking equal lengths (100 units) and breadths 
the totals, 25 


iii) divide every rectangle so drawn into parts equal in number to the number of 
Each part shaded or coloured will represent percentage size of one component. 


Example 2.12 Compare the budgets of families A and B with a suitable diagram. 


Items of Expenditure Family A 
- - 


Food 
Clothing 
House Rent 


Education 
Litigation 
Conventional Needs 
Miscellaneous 


The necessary computations redüired for the drawing of sub-divided rectangles are given 
and the diagram is shown on page Mo 


N 
Items of Expe Acti [тии 
Expenses Expenses Expenses Expenses 

Food 3 
Clothing 


House Rent 


Education 


Litigation 
` Conventional Needs 


Miscellaneous 
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100 
m MISC 
ж NEEDS 
E LITIGATION 
EDUCATION 
---- 
SS HOUSE RENT 
| = 
CLOTHING 
F FOOD 
t | 
M Rs.40  Rs.120 Se 
27.5 Pictograms. A pictogram is a popular device for po и the statistical data by means of 
or small symbols. It is said that a picture is worth t and words. It is customary to 
a unit value of the data by a standard symbol or cture and the whole quantity by an 
iate number of repetitious of symbol асау the larger quantities should be 
етей by a larger number of symbols and not by la уния А quantity smaller than the unit is 
ted by a part of the picture or symbol used. bols or pictures to be used, must be simple 


clear, A pictogram is virtually a bar chart ^» sd in pictorial way as the number of symbols or 
corresponds to the length of a bar. 3 


Example 2.13 Тһе ae = S number of employees in a certain Textile Mills. 


Representing 1,000 employees by one picture, the pictogram is drawn below: 
YEAR PICTOGRAM 


1,000 EMPLOYEES 
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2.7.6 Pie Diagrams. A pie-diagram, also known as sector diagram, is a graphic device с 
of a circle divided into sectors or pie-shaped pieces whose areas are proportional to the various parts 
which the whole quantity is divided. The are shaded or coloured differently 10 show 
relationship of parts to the whole. If space permits, the descriptive titles of the constituent parts should 
placed horizontally on each sector, otherwise a key becomes necessary. И is a convenient way 
displaying the component parts in proportion to-the total and therefore is used as an alternative 
component bar chart. It is an effective way of showing percentage parts when the whole quantity is 
as 100. It is also used when the basic categories аге not quantifiable as with expenditure, classified 
food, clothing, fuel and light, etc. The arrangement of the sectors must be made uniform in comparing 
charts. 

To construct a pie chart, draw a circle of any convenient radius. As a circle consists of 360°, 
whole quantity to be displayed is equated to 360. The proportion that each component part or cate; 
bears to the whole quantity will be the corresponding proportion of 360°. Thesé corre 
proportions, i.e. angles, are calculated by the formula > 


t 
Angle = OLEO pant „ 360° 
le equality 
Then divide the circle into different sectors by constructing angles at еше by means of a pro! 
and draw the corresponding radii. 
Example 2,14 Represent the total expenditure and „© on various items of a family 


pie diagram. s 
Items: Food лит. ra Fuel and Light Мізс. 
Expenditure: (in Rs.) 50 Ў 15 35 


The corresponding angles needed to draw ies are computed below. 


House Rent se Angles of the Sectors (in Degrees) 


Fuel and Light (XQ 120. 
Miscellaneo: 5 - 72 


“Тһе pie diagram consisting of a cirble divided into five sectors defined by angles 120°, 72°, 
36° and 84°, is drawn below:” PIE DIAGRAM 
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2.7.7 Profit and Loss Chart. This is virtually a percentage component which profits 
тап be shown above the normal base line and losses below the base line. Since to be extended 


iii) Polishing, etc. 
Total cost 

Proceeds 

Profit ( + ) or loss (—) 


А pie chart may also be used for this purpose. 
GRAPHS 


As already stated, diagrams are useful for representing spatial series. Diagrams fail when we want 
a statistical series spread over a period. of time, or a frequency distribution or two-related 
in visual form. For such representations, graphs are employed. 


Graphs present the data їп a simple, clear and effective manner, facilitate comparison between two 
than two statistical series, and help us in appreciating their significange readily. Another 
of graphs is that they provide an overall picture of a statistical series. Graphs are also 
used to make predictions and forecasts. Certain partition values also be located 
. But graphs are less accurate as they do not give minute details, =. they cos 
expenditure and time. 


Construction of Graphs. In the construction of.a graph, the first step is to take a starting point, 
as the origin, in the left-hand bottom corner of the graph paper. Two straight lines perpendicular to сас 
other are drawn through the origin, The horizontal line is called the X-axis or abscissa and the vertical 
is labeled as. Y-axis or ordinate. The two lines together are known as co-ordinate axes. Some suitab 
scales are selected along X-axis and Y-axis. Independent variable is taken along X-axis and dependent 
variable along Y-axis. Points are plotted and joined to get the required graph. While constructing a graph, 
the following points should be kept in mind: 

i А scale and the form of representation is to be selected in such a way that the true impressio 
of the data to be represented is given by the graph. 

ii) Every graph must have a clear and comprehensive title at top. Where necessary, sub-titles 
should be added. 

ii) Тһе source of the data must be given. A key and footnotes should be provided when 
necessary. 

iv) The independent variable should always be placed on the horizontal axis. 

v) Тһе vertical scale should always begin with zero, o! ise the graph will give a false 
impression. И, however, the first item of the data is quite ; а scale-break should be shown 
between zero and next member, © 

vi) The horizontal axis does not have їо-Берїп wi! То unless of course, the independer 
variable or the lower limit of the first class intends “ето. d 

vii) The axes of the graph should be properly Labels should clearly state both the variable 
and the units, e.g. "Distance" and “Kilon “Sales” and “Rupees”, etc. 
үш) Curves if more than one, must arly distinguished either by different colours or by 
differentiated lines (solid, das hed). 
ix) Тһе graph should not be | th too many curves. 
Graphs can be divided into Fain categories, namely; 
a) Graphs of Time- ‘or Graphs of Historical Data, and 
b) Graphs of F. $ Distributions. The important, graphs of frequency distributions 
Histogram, Frequency Polygon, Frequency Curve and the Cumulative Frequency Curve of 
Ogive. 
2.8.1 Graph of Time Series-Historigram. A curve showing changes in the value of one or mo 
items from one period of time to the next is known as the graph of a time series. This curve is also calle 
a Historigram. Thus a historigram displays the variations in time series dealing with prices, productio 
imports, population, etc. To construct a historigram, time is taken along X-axis and the values of 4 
variables along Y-axis. Points are plotted and are then connected by means of straight line segments to ge 
the "Historigram" 
Example 2.15 The following table gives the number of cars produced in Germany during ti 
years 1929-1936. Draw a suitable graphs, ѓе. Historigram of the series. 
Years: 1929 1930 1931 1932 1933 1934 1935 1936 
No. of Cars: 98 74 68 50 99 172 245 302 


https://stat9943.blogspot.com 


PRESENTATION OF DATA = 37 


The histerigrani i davr for the баш by taking. years on horizontal ais ad the sumber of esrs ou 
vertical axis as below: ; 


1.8.2 Histogram. А histogram с 

boundaries (not class limits) on axis and whose heights are proportional to the frequencies 
with respective classes. The of each rectangle represents the respective class frequencies. 
% one of the most important cal representation of a frequency distribution. When the class- 
аге equal, the rectangl ve the same width and their heights directly represent the class 
тез, that is they are ically proportional to the frequencies in the respective classes. The 
figure shows the histogram for the frequency distribution of Example 2.3. 

Y 


HISTOGRAM FOR FREQUENCY DISTRIBUTION 
OF ANNUAL DEATH RATES 
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If the class-intervals are nof all equal, the héight of the rectangle over an unequal class-interval. 
to be adjusted because it is area and not height that measures frequency. This means that the height 
rectangle must Бе proportionally decreased if the length of the corresponding class-interval incr 
For example, if the length of a class-interval becomes double, then the height of the rectangle is to 
halved so that the area, being the fundamental property of the rectangle of a histogram, 
unchanged. This sort of rescaling is necessary so that the correct pattern of the distribution is to 
conveyed. = 

When the frequencies in a frequency distribution are given against the class-marks x, of 
class-intervals. of width A, a histogram is constructed by drawing vertical lines (dotted) whose hei, 
correspond to the respective class-frequencies at the class-marks marked off on the axis of X and 
a series of adjacent rectangles with widths equal to -x;+h/2 (i.e. half of the width is taken on either 
of x, ). г | 

It is important to note that in the construction of a histogram, we assume that within any one 
the values of the variable are evenly spread out between the class- ries, A histogram which 
not be confused with the Aistorigram (graph of a time series) is ul in forming a rough idea of 
overull pattern and shape of the frequency distribution. Ke ' 

Example 2.16 Construct Histogram for the с frequency distribution relating to thé 
(to nearest t birthday) of telephone operators. 


fete [veo [ais [on [ea | 


ght of each rectangle cannot be made equal to the 
culated by dividing the frequency (the area) by the со 
calculations and the histogram follow: 


As the class-intervals are unequal, 
The height of a rectangle is therefor’ 
class interval (the width). The ngfess: 


9-2-4,5 
188 + 5 737.6 
160-5-32.0 
123 -5- 
84-10-84 
15515-10 


17.5-19.5 
19.5 - 24.5 
24.5 -- 29.5 
29.5- 34.5 
34.5- 44.5 
44.5- 59.5 


24.6 
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Y 


HISTOGRAM FOR UNEQUAL CLASS INTERVALS 


2.8.3 Frequency Polygon. А Haas polygon is a рга 
‘watch is constructed by plotting the points (x;, f;) where x; is 


‘Se corresponding frequency, and then connecting them by 


as of a frequency distribution, 


fass-mark.of the ith class and f; is 
ht line segments provided the class- 


"mme technique that was used for histogram, It can al: 
шелер еб in the histogram by means of straight гче 
the horizontal axis. But a polygon, as we 

to add “extra” class 1narks at both 


tained by joining the tops of the successive 
nts. The graph drawn in this way does not ` 
a closed figure havitig many sides. It i is therefore 


Ge minu: part of the graph. The 
le 2.2 is given below: “ 
е 


ү S FREQUENCY POLYGON FOR 
KW FREQUENCY DISTRIBUTION 
OF WEIGHTS 
20 


M O HM HM M M M 
Pees es 
лт мл їл ui 

MID POINTS 


N 
к 
ҮЧ 
л 


A frequency polygon which сап be used for comparing two or more data sets, gives roughly the 
of the mode, some idea of skewness and kurtosis of the curve (these terms are defined later). 
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2.8.4 Frequency Curve. When a frequency polygon or a histogram constructed over c 
intervals made sufficiently small for a large number of observations, is smoothed, it ap he 
continuous curve, called a frequency curve. The concept of a frequency curve is of great im 
statistics. Mathematically, the curve is represented by the relation у= f(x) and has an impor 
property concerning its area, The following graph represents histogram and frequency curve for 
frequency distribution of the AM annual death rates of Example 2.3. 


FREQUENCY CURVE 


cumulative frequency. polygon, populi 
о jiv)is a graph obtained by plotting 
or lower class boundaries depending v 
"than" type, and the points are joined by straight 
moulding called an ogee, a cumulative freque: 
cumulation is of Jess-than type, is constructed by plott 
is the upper class-boundary of the ith class and F, is 
and connecting the successive points by straight line segme 


2.8.5 Cumulative Frequency Polygon or O 
known as Ogive (rhymes with “alive” and 
cumulated frequencies of a distribution against 


whether the cumulation is of the “less than" 
segments. Because of its likeness to an 
polygon is called an Ogive. An Ogive, 
the points x, + 4/2, Е) where x; 

cumulative frequency for the ith PAN 
The polygon should start from қи айну ннн itte first interval, i.e. the point (x, - A/2, 
is plotted and joined, and (ўм а polygon, the last point is also joined with the last upper cla: 
boundary. In case of we merely join the unequally spaced points. 


CUMULATIVE FREQUENCY POLYGON (OGIVE) 
y FOR FREQUENCY DISTRIBUTION OF 
WEIGHTS OF 60 APPLES 


5555 8 


CUMULATIVE RFEQUENCY 
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If relative frequencies are used; the cumulative frequency polygon rises from the value 0 at the left 
® the value 1 at the right. A smoothed Оріуе is called an Ogive curve, which is often used to locate the 
gartition values such as the median, quartiles, percentiles, etc. of a frequency distribution. 2 


А percentage cumulative frequency polygon ог curve may also be drawn Бу expressing the 
cumulative frequencies as percentages of the total frequency and then connecting the plotted percentages 
against upper class boundaries. This graphic device is useful for comparing two or more frequency 
Sstributions as they are adjusted to a uniform standard: 


2.8.6 Ogive for a Discréte Variable. When a variable X is discrete, its cumulative frequency 
polygon consists of horizontal line segments between any two successive values and has a jump of height 
Т at each value of х;. In other words, the cumulative distribution increases only in jumps and is 
senstant between jumps. For the purpose of illustration, the cumulative frequency polygon drawn for the 
Sequency distribution of assistants in Example 2.5, is shown below: 


Y 


: OGIVE FOR DISCRETE VARIABLE 


bo N ш ы л 
o o 


m 
CUMULATIVE FREQUENCY 


This graph shows that the sq frequency polygon is stepped. Such a function is called a 
mer-function. 


2.8.7 Types of Frequen rye. The frequency distributions occurring in practice, usually 
жопе to one of the followi ; 


i) Тһе Symmetrical Distributions. A frequency distribution or curve is said to be symmetrical 
if values equidistant from a central maximum have the same frequencies, i.e. the curve can be 
folded along the central maximum in such a way that the two halves of the curve coincide. 
The Normal curve is an important example of a symmetrical distribution. 


Y 
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ii) Тһе Moderately Skewed or Asymmetrical Distributions. A frequency distribution or 
is said to be skewed when it departs from symmetry. Here the frequencies tend to pile 
one end or the order end of the distribution or curve. This is the most common 
encountered in practice. 


Y 


Ф 
ш) The Extremely Skewed or J-shaped di Nons. Here the frequencies run up 
maximum at one end of the range, havi shape of the letter J or its reverse. Most 


x 


0 


іу) The U-shaped Distributions. In such frequency distributions or curves, the maxi 
frequencies occur at both ends of the range and a minimum towards the centre, shaped 
or less like the letter U. A distribution of this type is rare... 


тівемтатонов BiEPS://stat9943.blogspot.com _  & 


Y 


x 
0 
% 

2.8.8 Ratio Charts or Semi-logarithmic Graphs. In the ordingd pes of graph, the scales used 
же called the natural scales or the arithmetic scales. These hcan only Бе used to compare the 
solute changes in values because the ordinary graph paper, as arithmetic paper, is so ruled 
Xx equal intervals anywhere on the paper represent equal or amounts, More often we are 
aested in studying tbe relative changes or ratios. The уе changes or ratios can be displayed and 
by the slape'of straight line when the 10, of the values are plotted on an arithmetic 
‚ In practice, the difficulty of looking up log: can be dispensed with by using another type of 
paper, called Semi-logarithmic paper or . A semi-logarithmic paper or ratio paper is so 


that equal intervals on the verti indicate equal ratios or rates of change, while equal 
on the horizontal axis differences or amounts of change. Thus the essential 
of a Semi-logarithmic chart is axis has a logarithmic scale and the other has arithmetic 


^ 


Graphs obtained by plo дф values on a semi-logarithmic paper or ratio paper and joining the 
sive points by means У t line segments are called Semi-logarithmic graphs or Ratio charts. 
ze generally used when i 


5» the relative rates of change аге to be compared; 


visual comparisons are to be made between two or more series which differ widely in 
magnitude: and 1 


ж) the data are to be examined to see whether they are characterized by a constant rate of change. 
chart possesses the following characteristics: 
There is no zero line on the logarithmic scale as the logarithm of zero is minus infinity. 


LJ 


А geometric progression when plotted on semi-logarithmic paper, forms a straight line, as the 
logarithms оҒа geometric progression form an arithmetic progression. 
The slope of the logarithmic scale variable indicates the rate at which the variable is changing 


(Le. increasing or decreasing). m 
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iv) In case of two or more curves, the curve having the steepest slope, has the largest rate 
change. 


v) Equal slopes (in case of parallel curves) indicate equal rates of change. 


EXERCISES 


OBJECTIVE 


a) Answer ‘True’ and ‘False’. If the statement is not true then replace the underlined words 
words that make the statement true: 


- į) Тһе term cross-sectional data refers to data that may change overtime. - 
ii) The frequency distribution represents data in a condensed form. 
ш) The data presented іп an array does not allow us to locate the largest and smallest values in 


‘iv) The classes in any frequency distribution are generally not ly exclusive. 
v) For nominally or ordinary scaled data the frequency disgfution cannot cannot be constructed. 
vi) Frequency distribution of continuous data may be 
vii) ^ Frequency distribution can be presented graphy 


viii) Time series data can be tabulated ising су distributions. 
ix) Simple bar diagram is used for tw fisional comparisons. 
X) Тһе width of a bar in histogram nts the frequency rather than the value of a variable. 
xi) Class marks are the lower limite iff Of each class. 


* xii) ~The lower class limit is ine idle possible data value for a class. 
хш) The sum of relative бә Вепсіев in a relative frequency distribution should always equal 100 100. 
xiv) A pie chart can aah! to display quantitative data. 

xv) " e of the frequency distribution and the relative frequency distribution always will 
b) MULTIPLE CHOICE QUESTIONS. 
i) Which of following is not an example of condensed data? 
а) frequency distribution b) data array C) histogram d) polygon 
ii) In the construction of a frequency distribution the steps are to: 
а) decide the number of classes 
b) arranging the data in ascending / descending order 
c) locate the smallest and largest values in a data set 
d) all of above < 
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ji) The number of classes in a frequency distribution generally should be 
a) less than five b) more than twenty 
€) between five and twenty d) between ten and twenty 


ivy) Аз the number of observations and classes increase, the shape of the frequency polygon: 


a) remains same b) tends to smooth 
с) become more erratic d) none of them 

v)  Acumulative frequency distribution is graphically represented by: 
a) frequency curve b) frequency polygon 
c) pie chart d) ogive 


ж) А relative frequency distribution presents frequencies in terms of: 


a) whole numbers , b) percentages 
c) fractions d) all of above © 
es) А diagram that presents properties that look like slices of a "X is known as: 

а) a bar diagram b)a г bar diagram 
с) a histogram. d d) у? agram 

- Observed data organized into tabular form E 
a) a bar chart му» a pie chart 
с) a frequency polygon 9 d) a frequency distribution 
The number of occurrences of a ч value is called: 
2) the frequency N b) the cumulative frequency 
c) the relative füequenq r d) all ofabove 
In the following 


The number that occurred the most is 
a) 2 b) 55 c) 42 4) 5 


УЕ 
Explain what is meant by classification. What are its basic principles? 


22 


2.4 


2.5 


2.6 
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Define the terms “Classification” and “Tabulation”. Outline the main steps in tabulation. 
do you mean by captions, stubs, title and prefatory notes? (P.U., B.AJB.Sc. 1 


What is a statistical table? What are different types of tables? Explain the different parts 
table and the main points to be kept in mind in their construction. (P.U., B.AJB.Sc. 1 


Represent the data given in the following paragraph in the form of a table, so as to bring 
clearly all the facts, indicating the source and bearing suitable title: 


"According to the census of Manufacturers Report 1945, the John Smith Manuf 
Company employed 400 non-union and 1,250 union employees іп 1941. Of these 220 
females of which 140 were non-union. In 1942, the number of union employees incre; 
1.475 of which 1,300 were males. Of the 250 non-union employees 200 were males. In 
1.700 employees were union members and 50 were non-union. Of all the employees in 
250 were females of which 240 were union members. In 1944, the total number of emp! 
was 2,000 of which one percent were non-union. Of all the employees in 1944, 300 
females of which only 5 were pon-union." 


а) Write short notes on: S 
Class-frequency, Class-Interval, Class limits, ge Marks, Size of Class-Interval 
Sturges’ rule. 


b) Determine class boundaries, class СА marks for the first and last с 
respect of the following: 
i) Weights of 300 entering Ў, ranged from 98 to 226 pounds, correct 
nearest pound. 
ii) The thickness of 464% ers ranged from 0.421 to 0.563 inches, 
c) A sample consists of RP observations, each recorded 45 correct to the nearest i 


ranging in value fi 1 to 337. If it is decided to use seven classes of width 20 
to begin the бїз at 199.5, find the class boundaries, limits and marks of the 


classes. X 
x (LU. M.A. Econ., 


a) What is meant by a frequency distribution? Describe briefly the main steps 
preparation of a frequency table from raw data. 
(P.U., B.A./B.Sc. 


5 Prepare a frequency table for the price data given below, taking 5 units as the 
class-interval. 


100 96 92 88 86 84 82 80 78 91 
877 1552 MEIST 14432 71: 6%, "58. 56 
73 7 90 753 55 7 334 5]1- 48. 46 63% 59 
3572051. IO РР ИУ ИЗ. AR 38: 5420 50 
$6, 4 42 40 38 36 46 53 50 4 
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a) Why are m. sy dispributions constructed? What are the rules to be observed in making 
a frequency distributionfrom ungrouped data? 


b) A record was made of the number of absences per day from a factory over 35 days with 
the following results: 


LT TT Гороз Гезе] 
коога о ә е ааа 


i) Оп how many days were there fewer than 4 people absent? 2+ 
ii) Оп how many days were there a! least 4 people absent? но 
iii) What is the total number of absences over the whole 35 days? 2% 
(М.А. Econ. II Semester, 1980) 


a) Describe the steps you would take to construct a frequency distribution. 
>) Tabulate the following marks in a grouped frequency distribution. 
74 49 103 95 90 18 52 88 101 96 72 56 110 97 
59 62 96 82 65 85 105 116 91 83 99 S®y76 84 89 


77 104 96 84 62 58 66 100 CAKE E г, 
66 96 83 57 60 Я 114 120 121 nah 63 95 78 


The following data give the index numbers о e: in a certain year. Form à 
grouped frequency distribution, taking 5 as ia 
у" 


4 
Й 


91 124 109 129 141 қо 76 118 1и se 
99 99 114 100 P 08 5 10 101 71 175 
63 121 122 111 qe" 77 1027 6 13 6 ? 
77 177 10 99 96 96 86 16 119 79 4 


817127 86,453 79 129 151 89 143 147 ЦА 
90 142 1А 94 125 96 99 138 145 113 T 
09 87 M3 110 144 91 106 104 97 115 4 
100 117 73 134 108 102 123 106 119 104 
100 120 112 138 140 103 96 136 78 83 
75 100 113 14 109 116 109 116 104 128 


Arrange the data given below in an array and construct a frequency distribution, using 4 class? 
interval of 5.00. Indicate the class boundaries and class limits clearly. 


794 716 95.5 730 742 818 906 55.9 
752 819 680 742 807 657 676 829 
881 778 694 83.2 827 738 642 639 


683 486 835 708 721 716 594 776 
(B.LS.E. Lahore, 1972) 


48-----2-- --3----- INTRODUCTION TO STA 
The following figures give the number of children born to 50 women: 
2 6 l 5 4 3 3 8 3 1 
4 3 3 0 5 1 4 3 3 
5 З 6 3 3 2 2 7 3 
1 à 2 4 4 4 6 8 10 7 
7 5 6 5 3 2 3 9 2 2 


- 243 
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Construct an ungrouped frequency distribution of these data. 


Count the number of letters in each word of the following passage, hyphenated words 
being treated as single words and make a frequency distribution of word length. 


"То forgive an injury is often considered to be a sign of weakness; it is really а 
strength, It is сазу to allow oneself to be carried away by resentment and hate into 
vengeance; but it takes a strong character to restrain those natural passions. The 
forgives an injury proves himself to be the superior of the man who wronged him, and 
wrong-doer to shame Forgiveness тау еуеп turn a foe into a friend. So mercy is the 
fornyof revenge." 5% 
The weights of 50 football players are listed below: 
193 240 217 283 268 212 A 263 275 208 
230 288 259 225 252 9 243 247 280 234 
250. 236,277 A 245 231 269 224 259 
258 231 255 g 245 246 271 249 255 
7265 235 243 $ 5 245 238 257 254 284 ! 
Make a stem-and leaf S fe data and convert it to а frequency table with 10 
beginning with 190. 


Make a stem-and-leaf Ў the following data. Using 8.0 as the lower: limit of 
class and with a wid 1 unit, convert it to a frequency distribution. 


ES an ua 21 107 138 108 
N 1 


13.6 16.4 11.0 15,8 9.3 137 

11.0 8.0 12.0 11.5 97 11.6 
10.1 14.1 10.0, 9.9 134 15-7 (1,5 
123 9,8 13.0 91 8.2 12.9 14.0 
10.5 13.2 10.5 10,6 12:5 15.1 12.8 
10.4 11:2 93 1L7 177 13.9 16.9 
i34 118 168 142 11.8 96 11.9. 
87 14.7 10.9 17,9 11:5 147 15.9 
LER 10.6 12.6 12.6 15.7 14,9 9,9 


Describe the advantages and disadvantages of diagrammatic representation, Describe 
the important types of diagrams 


Describe each of the diagrams listed below and give an illustration m each case 


Bar diagrams; Multiple bar diagrams; 
Pie-diagrams; Pictograms. and Profit and Loss charts 
(P.U.. B.A./B 


tt. HÉHu———À———————————— — M M9 
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217 Give а description of various graphic and pictorial aids for representing data. Mention 
particular uses of some methods. (Р.С, В.А./В.5с. 1961) 


118 Describe briefly the different types of diagrams generally used for presenting statistical data. 
State advantages and disadvantages of any three of them giving illustrations where possible. 

219  Represent the following yield per acre data by a bar diagram. 

Years: 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 

Yield Per àcre:- 5 7 9 6 10 12 $ 11 12 10 


20 . .Following table gives the birth rates and death rates per thousand of a few countries. 
Represent them by multiple bar charts. 


India 
Japan 
Germany 
Egypt , 
Australia 
New Zealand 
France 
Russia 


» Represent the following data by rectangular dign wing percentage of Income spent by 
two families on different items of expenditure. N 


Family-budgets of families 


Income Rs.40 
Actual Expenses 
Food ЖЕ 1 Rs 20 
Clothing А x Rs. 8 
Shelter я Rs. 4 
Fuel and TU | Rs.2 
Miscellaneous 


The following table gives the details of р expenditure of three families. Represent the - 
data by a suitable diagram c on percentage bas co "пел 


Food АЗЫ 
Clothing 
Recreation 
Education 
Rent 
Miscellaneous 


223 Represent the following data by means of a pictogram: 


“224 


Б) Graph the following Showing the areas in millions of square miles of the 
the world, using chart, (ii) a pie chart. 


2.25 а) Тһеагеа sown іп Rabi Crop is as follows: Prepare a Pie-chart. 


b) Calculate the per cent contribution of each crop to the total Rabi crops. 
(P.U., В.А /В.5с., 1 
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2.26 Represent the following data by sub-divided bars drawn on a percentage basis or by a 
Pie-diagram. 
Cost per ton disposed commercially 


Wages 


Other costs 
| Royalties 
Total 


Sale proceeds per ton 


Profit ( ¢ ) or loss ( — ) per ton 


221 a) Describe the Graphical Methods used in Statistics, = respective advantages and 
disadvantages. 1% (Р.О... М.А. Econ. 1981) 


b) Draw up a list of rules for the construction of Кеш 
eS (P.U.. В.А. (Optional), 1969) 


, igi: 
228 State the general rules which should be boa mind in the construction of graphs. Draw a 
suitable graph of the following time serie 9 ; 


229 Show graphically the following monthly imports and exports of a particular commodity d 
the year 1960-61. Also show graphically the balance of trade. Imports and exports are gi 
crores of rupees. 


August 


September 
October 
November 
December 
January 


February 
March 


2.30 а) Explain with the help of рау difference between a frequency polygom 
histogram and an ogive. 4%: (P.U., В.А. (Part-I), 19 


b) Construct (i) a em Relative frequency polygon and (iii) ап Ogive for 


following frequency di оп of the heights of 100 male students at Islamia Univers 
Bahawalpur. 


(LU. M.A., Econ. 19% 


2:31 What is meant by a Histogram? Draw a histogram for the distribution of earnings (Rs.) рїї 
below: 


Ex 180-184 | 185-189 = 200.204 | 205-209 | 210-214 | 215-219 
40 29 23 Ж. 


State how you would construct the histogram if ће class-intervals were unequal in size. 
(Engg. University, B.Sc. Final, 19' 


2.32 a) Define the statistical term Histogram. 
b) Explain the method of constructing histograms when the class intervals are unequal 


É 
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с) Ina savings group, there are 400 members and the number of savings certificates held by 
them are shown in the following table. 

No. of certificate held 
1-50 
51-100 
101 ~ 150 
151-200 
201- 300 
301 - 400 

801-500 


Construct a histogram of the distribution of savings certificates. (P-U., B. Com., 1961) 


No. of members 


233 Draw a Histogram and a Frequency Polygon for the following distribution. 
Degree of Cloudiness | 100 9 Raroa Os, nt Ub ws 1 0 
Frequency: „580 150 196 75 55 40 4S ~8 75 130 220 

eS (Р... B.A. (Part-1); 1962) 


E Draw a MEUS illustrating the following data: gv’ 


Ra Ta [s Ten 
Nh Га Геи 
м, 


b) Given the following frequency of the heights in centimeters of 1,000 students, draw 
its histogram showing the c x 


| Number of men 


x 185-157, 158-166 
4 NS 26 53 89 146 188-181 125 92 60 224 1 1 
SS (P.U., В.А /B.Sc., 1971) 


а) Whatisa сата ЖУ, ге frequency curve? How does it differ from the ordinary curve? 
(P.U., М.А. Econ. 1985) 


b). Describe the common types of frequency curves. Indicates their shapes 
Construct an Ogtve from the following table. 


[Weights _ ч ШЧ Frequency | 


| 118-126 3 | 

| 127-135 | V 
136 - 144 у | 
145-153. 12 | 
154—162 5 
163-171 4 | 

Treen Сз 2. 


(P.U., В.А./В.8с.. 1967) 


237 


2.41 


2.42 


Time taken (minutes) | <5, <10, <15, <20, «25, «30, «35, <40, <45 P 
Cum. Frequency (F) 28 45 81 143 20 349 374 395 40 


Pupils were asked how long it took them to walk to school on a particular 
cumulative frequency distribution was formed. 


а) Draw a cumulative frequency curve and estimate how many pupils took less 4 
minutes, 


b) 6% of the pupils took x minutes or longer. Find x. 


c) Take equal class-intervals of 0—, 5—, 10-, étc., construct a frequency distribution an 

a histogram. 1 
Describe in detail the method of drawing Ratio Charts and explain their uses in ссе 
statistics. (P.U., М.А. Econ, 


Explain what is meant by a ratio chart and discuss its advantages over the natural 
diagram. Describe and illustrate two practical applications of a ratio chart. 


Toss five coins together and note the number of heads K Do this 64 times and co 
number of times that x = 0, 1, 2, 3, 4, 5. Construct 'equency polygon and ап О 
represent these results. e 


Me 

a) Distinguish between primary and чода data. Describe briefly the 
“collecting primary data. S 

b) Find the missing entries in the follofv frequ distribution table? 


ve Cumulative Cumulative 
Frequenc Percentage 
se 
E X cB 


ҳу (P.U., ВА./В.5с., 


The following data give the annual earnings (rounded to thousands of dollars) 
households. 
2057409 “ТАР PST ше. АДК |, 
13: 52/5425 17" 20. И5: 45 22 
78 79 99 30 80 80 75 63 
24 35, 25- 90 :35: 33. 70 63 
352 0317 „9 
i) Prepare a stem-and-leaf dispiay for these data, 
ii) Condense the stem-and-leaf display by grouping the stems as 0-2, 2-5 and 6-9. 
(P.U., B.A./B.Sc., 


%%9%%%%%%%9» 
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3.1 INTRODUCTION 


For practical purposes, the condensation of data set into a frequency distribution and the visual 
presentation are not enough, particularly, when two or more different data sets are to be compared. A data 
set can be summarized іп a single value. Such a value, usually somewhere in the centre and representing 
the entire data set, is п value at which the data have a tendency to concentrate. The tendency of the 
observations to cluster in the central part of the data set is called Central Tendency and the summary 
value as а measure of central tendency. Since a measure of central tendency indicates the /ocution or 
general position of the distribution or the data set in the range of observations, it is also known as a 
measure of location or position. The measures of central tendency or location are generally known as 
Averages, But in everyday language, ‘the average’ 15 often understood to refer to the arithmetic mean 
12 form of average to be discussed in section 3.4), it is for this reason that when anyone speaks of ‘the 
average’ (without qualification) of a set of observations, it may, as a rule; be assumed that the arithmetic 
mean is meant. The use of the term average has been traced to the time of Pythagoras (570—500 B.C.). 
Two points should be noted. First, a measure of central tendency should be somewhere within the range 
ef the data, and secondly, it shbuld remain unchanged by a rearrangement of the observations in à 
«xfferent order. 


Since the late nineteenth century, the practice has been to mig? distinction between a sample and 
а population from which the sample is drawn, by using Latin 1 for numerical quantities describing 
= sample and Greek letters for corresponding quantities terizing the population. It should be 
ed that population parameters are rarely calculated di all observations from the population are 
ot usually available. The measures corresponding se parameters are generally calculated from 
sample data and are regarded as the estimates of п parameters. 


31 CRITERIA OF A SATISFACTO: aes à 


Several types of averages are de measure the representative or "typical" value of a set of 
Sem or distribution. It is therefore C.N 1 an average should be 


i) rigorously defined, E 
i) Баѕейоп all the оол» made, 
Ш) simpleto апае Ў and саѕу to interpret, 
iv) quickly and easily calculated, 
v) amenable to mathematical treatment, 
vi) relatively stable in repeated sampling experiments, and 
wi) not unduly influenced by abnormally large or small observations. 
Ап average that possesses all ог most of the conditions stated above, is considered a satisfactory 
е, 
TYPES OF AVERAGES 


The most common types of averages are (1) the arithmetic mean or simply the mean, (ii) the 
tric mean, (iti) the harmonic mean, (iv) the median and (v) the mode. The first three types are 
tical in character and give an indication of the magnitude of the observed values. The fourth type 
tes the middle position while the last provides information about the most frequent value in the 
ion or the data set. 


3.4 THE ARITHMETIC MEAN 


The arithmetic mean or simply the mean is the most familiar average. It is defined as a 
obtained by dividing the sum of all the observations by their number, that is 


е Sum of all the observations 


Mean = 
Number of the observations 


The mean may correspond to either a population or a sample from the population. If the given set 
observations represents a population, the mean is called the population mean which is traditio 
denoted by и (the Greek letter mu). Thus the population mean of a set of N observations x,,x»,..... 


given as 


реа „ч Р Ls 


where X, the Greek capital sigma, is a convenient symbol for summation. 
If, instead, the given set of observations represents a sample, the mean is called the sample 


usually denoted by placing a bar over the symbol used to represent the tions or the variable. 
the mean of a set of n observations x,,x;.......x, is defined as e 
X. 
ж-ты 2 бї, „ 


п п «ё 


where X is the mean ofa sample of size n. 0. 


It is worthwhile to note that the population is a fixed quantity, whereas x , the sample mean. 
is a variable because different samples from population tend to have different means, 


In order to interpret the meaning of aWħmetic mean, letx; denote the marks obtained by the ith 
student in a class. Then Ух stands е total marks obtained by all students and x, the mean, 


represents the number of marks juld have been obtained by each student if everyone in the class 
had obtained the same numbe: rks. Geometrically the mean represents a point at which the 
distribution or the set оҒоВвег оп would balance. 


Example 3.1 The marks obtained by 9 students are given below: 
45, 32, 37, 46, 39, 36, 41, 48, 36. 
Calculate the arithmetic mean. 
The mean is given by 


np E 
ї = =—— 
п ‘ 
_45+32+37+46+39+36+ 41+ 48+36 
9 


= 220-49 marks 
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It is relevant to note that, if these marks represent the entire set of observations for the population, 
bove calculation gives the population mean, i.e. и would equal to 40 marks. 


3.4.1 The Weighted Arithmetic Mean. The multipliers or a set of numbers which express more 
‘ess adequately the relative importance of various observations in a set of data are technically called 
weights. We assign weights wj, м... W, to the observations іп a set of data according to their 


е importance, when the observations аге not of equal importance. The weighted mean, denoted 
=. ofa set of n values x,,35,....,x, with corresponding weights э, Wz... W, is then defined as 


= oQTO XQ ti +X Wy 
E 


FA. 
Wy +, Ж... Жм, 


Ыы ‘ 
2% = (рабе) 
w x 
A weighted average is geherally employed in the calculation of index numbers, birth and death 
etc. 


Clothing 
Fuel and Light 


Rent 
Clothing 
Fuel and Light 


3.42 Properties of the Arithmetic Mean. The arithmetic mean has the following four properties: 
ЮВ Fora set of data, the sum of the deviations of the observations x,'s from their mean, X, taken 
with their proper signs, is equal to zero, 


һїїр5://5їа19943.Бо@5р01СО[тлтвтсл. 


The sum of the deviations =} (x; - X). (i1, 2...., п) 


=J -nt (^ X is constant) 


- x 2357 =0 (X8 S xin) 


ii) Тһе sum of squared deviations of the x;'« from the mean, ¥, is a minimum. In other 
Y -zf s Ys, - ау, where a is ап arbitrary value other than the mean. 


Now — X(x-aj = L(x, -¥+z-al 
= (х - X + Ux; - XXX -a) * (X а)2] 

= L(x; - xy x2 -a)X(x, - X) * n(X -ay 
Б L(x; -¥)? лба)? [sX(x;-x)-0] 

It is obvious that Ў (х; -а } > Х(х; -x) by n(x -ay.. The езе holds only when ¥ =a. 
Hence X(x, - x)? is always less than D(x; — a) if a + X. d 
This property is usually called the minimal property of қ 


iii) If k subgroups of data consisting of ses Nga (LM; =n) observations have res 
means X,, X;, ..., Xg, then X, the or all the data, is given by 


елі IX, + пуХу tuc ж 
ntn Ne 
* . 
= Хае (52.20 
ҒЫЗ 
i.e. a weighted шеш Уап the subgroup means. 
iv) If y; =ax,+b(i=1, 2,.., n), where a and b are апу two numbers an! a#0, 
У = а +b. 
Now summing over all values of i, we obtain 
Хуг =aLx, +nb 
Dividing both sides by л, we get 
у=ах+Ь 


As the equation у = ах + b represents a linear transformation from х to у, this property is usually calle 
the invariance of the mean under a linear transformation and it provides the basis for so-called codi 


2 
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refers to the operation of авза (ог adding) a constant from each observation and then 


Example 3.3 The mean heights and the number of students іп three sections of a statistics class are 
below: 


е overall mean height of 120 boys. 
ny = 40; пу =37; пу = 43, and 
¥ = 62", x; = 58", x4 =61" 
height of the combined class is given as 
S 


- тух +луХу+Пп3Хз 
TA ny tng +n Xe 
1 2 3 Q 


40-37 43 120 


343 Mean From Grouped Data. When of observations is very large, the data are 
into a frequency distribution, which is calculate the approximate values of descriptive 
as the identity of the observations is М о calculate the approximate value of the mean, the 
in each class are assumed to tical with the class midpoint so that the product.of the 
by the number of observati ke «sere would be approximately equal to the sum of 
for each class, Thus, if a; ency distribution has к classes with midpoints ху,х2,...,Х 


corresponding frequencies 2» fk (E f; =n), the mean is then given by the formula 
fixy + faxa eS ух, 
fit+ Satt Se 
Xf 
"A 


_ 40x 62) + (37 x 58) (43x 61) „тө ag 
N 


Х= 


t indicates the number of times an observation is to be counted, the mean calculated from a 
distribution may also be regarded as the weighted mean where each class midpoint x,, taken 
value of the observations in that class, is weighted by the respective frequency fi and the 
weighted products is divided by the sum of frequencies, i.e. weights. 

imes, there may be a slight difference in the values of x on account of errors caused by the - 
that all observations in any class may be treated as approximately the midpoint of that class, 
telis us that this error is usually small and never serious. The following example illustrates 
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Example 3.4 Calculate the mean weight of apples from the data given in Example 2.2 
from the observed values and from the data grouped into a frequency distribution. 


The calculations are outlined below: 


Weight | Sumofactual | f; | Mean Weightof | /х, | Midpoints |- fix; 
(grams) observations each class (X;) (xj) 


65 ~ 84 


85 — 104 
105 ~ 124 


125 ~ 144 
145 — 164 
165 - 184 
185 — 204 


Calculation based оп Ungrouped data ES 


We calculate the mean weight, x, directly from all ed values, which add to 732: 
second column consists of subtotal of actual prep y class). 


ye IM a 


This is the exact mean of the give, 


Next, we find the mean әд X, by multiplying the actual mean of ће observations in 
by the corresponding freq ; adding the products and then dividing by n (column 5). 


Thus rSEÁU — (0212.47) 
n 
73241 
= 212207 
@ grams 


Calculation based on Grouped data 


Here we calculate the mean weight from grouped data, assuming that all observations in any 
are identical with the midpoint of that class. The sixth column consists of class midpoints, х; 
products are given іп column 7. 


Then ї= ES (#=1,2,....,7) 
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It should be noted that the numerical value of x, calculated from the frequency distribution is 
йу different from the value obtained directly from ће ungrouped data. 

3.4.4 Change of Origin and Scale. To reduce the computational labour and to save time, a change 
же origin and scale сап be made, If x; denotes an observed value, а and b are two constants with 
=0, then the operations x, +a; bx, and bx; +a are known respectively as the change of origin, ће 
of scale and both change of origin and scale. 


Let a be an arbitrary origin, sometimes called assumed mean, and let x; =a+hu; where h denotes 


= of measurement. Then its corresponding coded value is и, = = 
Now тазы =й) 
п п 
SAT $ Adu, =а+ Ап 
п ЖА 
Фе arithmetic mean сап be calculated from any origin we may с and using any scale we 


This transformation is particularly useful for calculations b Sa grouped data, where л is the 

of class interval and a is usually chosen the class midpoi ing in the region of the higher 

2 ies so that the larger frequencies may be multiplied by setter values of u. This procedure gives 
Short method for hand calculations. 


М 
Example 3.5 Given the following frequency zo of weights, calculate the mean weight by 
Method. D 


ша where а = 114.5 is the midpoint 


85- 104 
105 — 124 
125 — 144 
145-164 
165-184 
185-204 


=114.5+ 


CACO) 4580-1225 grams. 
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=8.95+ —' -8,95-0.02-8,93 


3.5 THE GEOMETRIC MEAN 


The geometric mean, G. of a set of n positive values ху, x2,..., x, is defined as the positive 
root of their product, i.e. 


С = хрх: x, where х>0 
When л is large, the computation of the geometric mean becomes laborious, as we have to multi 


all the values and then extract the nth root. The arithmetic is simplified by using logarithms to the 
10. Thus, taking logarithms, we get 


MEASURES оғ cenTHALASHSEARRIARBOGSPOLCOM | |(, 


logG = logs +log x2 +...+ log x, ] 
Еу 
п 
Bence G =antilog Ба 


It means that geometric mean is the anti-logarithm of the arithmetic mean of the logarithms of the 
values themselves. 


For data organized into a grouped frequency distribution, having А classes with classmu:\.: 
*.32,^.,X& and the corresponding frequencies fi, fz»... / (X. f; =n), the formula for the geometr- 
is given by E 
G=[xf xf? „xf 1/% 
In terms of logarithms, the formula becomes $e 
i e) 
logG =—[f logs; + fa logxg +..+ fe Юл] a 
S 


БӨЛІГІ K 


or G= «шов [т Efi loga; | S 


The weighted geometric mean a values x;,x2,..,x4 with corresponding weights 
Nass Wh is given by A 


logGy = 


1 xj] 
Ew, : 
The geometric mean is appropriate to average ratios and rates of change. 
Example3.7 Find the geometric mean of 45, 32, 37, 46, 39, 36, 41, 48 and 36. 


The geometric mean, G, is calculated as 
G -3/(45x32x37 x46x39x36 x 41x48 x36) 
Taking logs, we have 


logG = 200845 +108 32 + 10237 + log 46 + log 39 + log 36 + log 41 + log 48 + log 36] 


= sns +1.50515+1.56820+ 1.66276 + 1.59106 + 1.55630 + 1.61278 + 1.68124 1.55630] 


log G = 511438700] = 1.59856 


Hence G = anti — log (1.59856) = 39.68 


masa 3.8 Given the following frequency distribution of weights, calculate the geometric 


Weight (grams) 
65—84 
85—104 
105—124 
125- 

145- 164 
165- 184 8.9672 
185-204 11.4445 


r жез 


1М202483 
logG -— fi logx, = ES a = 2.0708 


1.8722 
1.9754 
2.0589, 


16.8498 
19.7540 
35.0013 
21.2870 
10.9445 


Непсе G = Anti- o Aos- =117.7 grams 


3.6 THE нлмомс бам 


The harmonic mean, Н, of a set of п values xj,x5,..,x, is defined as the reciprocal 
arithmetic mean of the reciprocals of the values. In symbols, 


С! 1 


—+— +. ..+— 
Н = Reciprocal of 2-22 7n 
n 
- "E where x + 0 
Z= 
X, 


The harmonic mean is an appropriate type to be used in averaging certain kinds of ratios or 
change. To illustrate this formula, let us take an example. Suppose a car is running at the rate of 15 
during the first 30 km; at 20 km/hr during the second 30 km; and at 25 km/hr during the third 30 
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ance is constant but the times are variable. Therefore, the harmonic mean is the correct average. In 
case, the harmonic mean is 


RES OF 


3 
= 0.06667 + 0.5000 + 0.04000 
ZL 519,15 km/hr epproxtmagely. ; 


7 10115667 


Care should be exercised to apply the harmonic mean. The following rule will help determine the 
cation of the harmonic mean. 


“When rates are expressed as x per y. and x is constant, the harmonic mean is required; but if y is 
mt, the arithmetic театіѕ required. " 92 


For data organised into a frequency distribution having k classqgyWith classmarks х|,Х;,..,х; and 
7 ing frequencies /1,/2, у, (X f, =n) the SEXES) of the distribution is given'by 


Lif, he Ж 


eae х X XA 
H Reciprocal cd A` 


Example 3.9 Find the harmonic mean from the following frequency distribution of weights: 


85-104 | 105-124 | 125-144 | 145-164 | 165-184 
ШЕ ar р GRILLES m реа 
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We calculate the harmonic mean as below: 


from] = fo |, 


65-84 0.12081 
85-104 0.10582 
105-124 : 0.14847 
125 — 144 х 0.07435 
145 — 164 ; 0.03236 
165-184 0.02292 
из 204 0.02571 

п 60 
Непсе қарата $ 
liis о 
x; e 


Example 3.10 Compute the Geometric and as means for the following distributi 
annual death rates: 


11.95 1295 13.95 


Ж. (B.LS.E. Lahore 


0.59660 0.59660 0.25316 0.25316 


0.69461 2.77844 0.20202 0.80808 
0.77452 3.87260 0.16807 0.84035 
0.84198 10.94574 0.14388 1.87044 
0.90037 10.80444 0.12579 1.50948 
0.95182 18.08458 0.11173 2.12287 
0.99782 12.97166 0.10050 1.30650 
1.03945 10.39450 0.09132 0.91320 
1.07740 6.46440 0.08368 0.50208 
1.11229 4.44916 0.07722 0.30888 


1.14459 | 1.14459 | 0.07168 0.07168 
CHER RE 32561 |  -| 105067 
82.50671 _ 0.93758 


Now logG 517. log x; = 
n 


млзивезоксехт Ri etate 949. blogspot.com E 


Hence G =anti-log (0.93758) = 8.66, and 


88 
————— —.— а. 265% 
Биш 3 10.50672 , 


x 


3.7 THE MEDIAN 


The median is defined as a value which divides a data set that have been ordered, into two equal 
pacts. one part comprising of observations greater than and the other part smaller than it, Or more 
precisely, the median is a value at or below which 50% of the ordered data lie. 


Thus the sample median of the л observations x,,x2,...,x, when arranged in order from smallest 
x largest, is the middle value if л is odd, and the average of two middle values if n is even. Stated 


Ж етепіу, when E is not an integer, the median is (=! п observation, and when > is an integer, the 
the f zu and 1 th observati 
is the average о Ga ) observations. S 


median in case of a discrete or ungrouped frequency distributi tan be found as above by forming a 
ive frequency distribution. 


For data grouped into a frequency distribution, the M is a value or a point on the horizontal 
through which a vertical line divides the histo; distribution into two parts of equal area. 
other words, the median is that value on фе hogi scale which corresponds to a cumulative 


- This value would lie in a certain, p, called the median group, but a single value for the 


is often desirable. To obtain this, e value, we assume that the observations are evenly 
га within the median group. Мау be obtained as follows: 


Let us consider a relevant porti sf of the cumulative frequency polygon as drawn below, Then the 
15 the abscissa of the poi tis 


= OA + BD (sec figure iow). 
Y 


міз 


0 WEDIAN (HORIZONTAL SCALE) 
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Since BDM and BEF are similar triangles, therefor 2м = DN 
BE EF 
o BD-BE DM 
EF 


Now evidently BE is the width of the class-interval containing median and hence is equal to 


DM-—-AB, where AB represents the cumulative frequency corresponding to the 
preceding the median group. Let it be equal to C. Then DM = 57% 


EF is the difference between two cumulative frequencies, which is clearly the 
corresponding to the median group and is denoted by f. OA = / (which is the lower boundary of 


group), then substituting these yalues, we get the following formula 


Median =+(ж-Є) $ 


iN) 
This process of determining the median is called нед depot and it does not 
uniform class interval. If this arithmetical process is not изе Sut the value of median (on the 


corresponding to a cumulative frequency 2 is read dire m the graph of Ogive curve, the 


called the location of the median graphically. The whan is an average of position. И is also 
partition value, The population median may poe in the same way from all the observations 
population. 9 

3.7.1 Quantiles. When the numi observations is quite large, the principle according 29 
a distribution or an ordered data se ided into two equal parts, may be extended to any 
divisions. The three values wich Me the distribution into four equal parts, are called the 
These values are denoted by "and Q; respectively. Q; is called the first or lower quartile 
known as the third or up rtile. in other words, the quartiles О;, О and О, are the 
below which lie respective, the lowest 25, 50 and 75 per cent of the data. Similarly, the nine 
which divide the distribution into ten equal parts, are called Deciles and are denoted by Dy, Dz 
while the ninety nine values dividing the data into one hundred equal parts, are called Percentiles 
denoted by P;, Рэ, ..., Pos. The second quartile or the fifth decile or the fiftieth percentile is о 
identical with the median. - 

Quartiles, deciles, percentiles and other values obtained by equal subdivision of the given 
data, are collectively called Quantiles or sometimes Fractiles. The Quantiles should be calculated 
the number of observations is quite large.. 


It is interesting to note that all the Quantiles are percentiles. For example, the 37 quartile 
75th percentile and the 6th decile corresponds to the 60th percentile. We therefore use the fo! 
formula to compute P, the jth percentile from a set of л observations, arranged in order from 
largest. 
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i) When Тет is not an integer, the /th percentile is given as 


P, = Observation with ordinal number ШО 1; and 


8) When $25 is an integer, the jth percentile is 
P, Average of two observation with ordinal numbers 
ту ana (2), 
100 100 
[x] stands for the largest integer in x. 
In case of grouped data, Quantiles are calculated in the same way as the median. 


Sir Francis Galton (1822-1911) is considered the originator of dms median and percentiles, 
it was Gauss (1777-1855) who actually suggested median. © 


Example 3.11 Given below are the marks obtained by 9 ts: 


45, 32, 37, 46, 39, 36, 41 дул 36. 

| AS 

the median and the quartiles. ЬУ) 
To find the median and the quartiles, we dp ше the marks in order from lowest to highest. 

ordered marks are; 


32, 36, 9 39, 41, 45, 46, 48. 
ъф 


Hence n = 9 and 2 ie 2 is goth integer, therefore 


| Қы 
Median = Marks obtaine@y [71 * 2 student 


= Marks obtained by (4 + 1), i.e. 5th student in ordered data, 
= 39 marks 


Q, -Marks obtained by (шай student as т is not an integer. 


= Marks obtained by (532 іе. 3rd student. 


= 36 marks. 
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Similarly. 


Оз = Marks obtained by (Б + 2 student 


= Marks obtained by (eon 72 ie. 7th student. 


= 45 marks. 


Example 3.12 Тһе following distribution relates to the number of assistants in 50 
establishments. : 


№. ога 0| 1, 2| 3| «| sj ef 7] e| 9j 
[2 этейин seer fs: [аш 


Find the median number of assistants. Also calculate the quartiles and the 7 decile. 


This is an example of ungrouped frequency distribution with unit class interval. То 1 
median, the quartiles and the 7th decile for such a distribution, we cumulate the frequencies as 


the table. 
mermo СА Е Е ЕУ [#[7[*[э] 
[Frequency —  |344| «|71 М46)|5|51311) 
С] Ф 


Cumulative Frequency |3 | 7 


Since > ie. E is an integer, there a median is the average number of assi 


B and ( sija ‚ ie, 25th and ў establishments. Looking at the cumulative freq 


we find that these two values eda to the same value of x, ie. 4. 
Hence median number ofa stants = 4, 


For Q,, we see қ”. г is not an integer, therefore 
- 2 1.50 , 
О; = No. of assistants in Ga ij establishment. 


= No. of assistants in (12 + 1), i.e. 13th establishment 
7.2 assistants. 
Similarly, 


3x50 
4 


Q; = No. of assistants іп ( 2) establishment as E is also not an integer. 


= No. of assistants in 38// establishment. 
7 6 assistants. 


https://stat9943.blogspot.com 


URES OF CENTRAL TENDENCY OR AVERAGES _ 71 


7х50 


D; Average number of assistants E > jh an and Е AM NT p establishment as m isan 


integer. 
= Average number of assistants in 35// and 367} establishments 
= 5 assistants (since both values correspond to 5) 


Example 3.13 Find the median, the quartiles and the 87^ decile for the distribution of examination 
given below: 


Number of students [з | 190 | 304 [2:1] s 


(P.U., B.A/B.Sc. 1970) 


To find the median marks, quartiles, etc. by the process of linear interpolation, the data are 
10 be continuous. Thus we need the class boundaries as the cumulative frequencies correspond 
class boundaries, ie. to 395, 49.5, etc., and not to 39, 49, etc., the r class limits, Hence the 
boundaries are shown in the first column and the last column К table below consists of 


ve frequencies. 


lass-boundaries 
(marks) 


29.5 - 39.5 
39.5 - 49.5 
49.5 - 59.5 
59.5 - 69.5 
69.5 -- 79.5 
79.5 — 89.5 
89.5 —99.5 


7 Marks m? zh student 


— Marks obtained by 205, іс. 452.51 student which corresponds to marks in the class 
59.5 — 69,5. Therefore 


Е Б - с) where the letters have their usual significance. 


10 
= 59.5 +— (452.5 – 28 
304 ( 5) 


-595-1078 505 +5,5 = 65 marks. 
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And О, = Marks of (“а student 

= Marks of 2. ie. 226.251һ student which corresponds to a value in the 

49.5 - 59.5. Therefore 
һ(п 10 
= /+—| —-C |-49.5----(226.25-95 
о! Е c ITA ) 
= 49,5 + 6.9 = 56.4 = 56 marks 


Again Q; = Marks of (= h student 


— ‚ ie. 678.{51һ student which lies in the class 69. 5 - 79.5 


S 


h ( 3n 
= /+—|—=C e 
amra) SN 
= 69,547 (678.75 — 589) = ossa y$ 74 marks 
н 8л SS 
And = Marks of E ra student Ne 


К. 


амаз ors т ae student which also lies in the class 69.5 — 79.5 
h (8n 
Hence О, = 1+—| —-G 
en * ШЕ S 


- 69.5 22% (724 — 589) = 69.5 + 6.4 = 76 marks 


3.8 THE MODE 


The French word mode meaning fashion, has been adopted to convey the idea of “most freq 
The mode is defined as a value which occurs most frequently in a set of data, that is it indicates the 
common result. А set of data may have more than one mode or no mode at all when each obse 
occurs the same number of times. 


In an ungrouped frequency distribution with classes consisting of single values, the mode can 
immediately located by examining the distribution. For example, in the distribution relating to the п 
of assistants in 50 retail establishments (Example 3.12) the mode is 4, as the frequency for x = 4 is 
than for any other value of x. 
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When the data are organised into a grouped frequency distribution, the mode would lie in the class 
carries the highest frequency. This class is called the modal class. For most practical purposes, it is 
ient to take the midpoint of the modal class as the mode but generally it is a poor approximation. It 
ore becomes desirable to decide at what point of the modal class, the mode should be located. To 
this requirement a method based on three adjacent rectangles of the bd with the tallest in the - 
„ has been developed. The method is 
a ITE xh, 
Sn -fi)* Gn - fr) 
= lower class boundary of the modal class, 
Ĵa = frequency of the modal class, 

f 7 frequency associated with the class preceding the modal class, 
Ja= frequency associated with the class following the modal class, and 


Mode =/+ 


‘ 
А- width of class interval. ҚМ 
mode can also be calculated by the following formula: oe 
9 
Моде = / +7 A 2 xh: © 


*fa qu 
the letters have their usual meaning. It should rage that the first formula is more accurate and 
be generally used in calculating the mode. x e 


When a frequency distribution is displ: as a smooth curve, the mode is the abscissa of the 
ordinate. A distribution having e mode, is called a unimodal distribution, while a 
ion with two or more modes, is T bimodal or multimodal distribution. It has no meaning for 
d distributions. It should b jbered that, when a frequency distribution has classes of 
widths, the modal class 4 class with maximum frequency per unit. The mode shoyld be 
if the frequency di on has a class that carries more frequencies than the others and this 
should not be at the extre of the distribution. 


Example 3.14 Calculate the mode for the distribution of examination marks given in Example 


The class that caries the highest frequency is 59.5 — 69.5, which is thus the modal class. 
Also /=59.5, /1= 190, f; = 211, /,-304 and ^ = 10. 


fni 
Mode = 1+ —————À————— xh, 
Us f sf 
asg 204-190. 10 


(304 - 190) + (304—211) 
= 59.5 + 5.8 = 65.3 = 65 marks. 
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3.9 EMPIRICAL RELATION BETWEEN MEAN, MEDIAN AND MODE 


In a single-peaked frequency distribution, the values of the mean, median and mode coincide if 
frequency distribution is absolutely symmetrical. But if these values differ, the frequency distributi 
said to be skewed or asymmetrical, 


Y 
o Г u Н i x 
5 & 
= g > 
Experience tells us that in a unimodal curve of mo rfe skewness, the median is 
sandwiched between the mean and the mode and between 2 following approximate relation 


good. S 
Mean — Mode = 3'(Mean - Median) qu 


or Mode = 3 Median ~ 2 Mean. _ АСУ 
. 
This empirical relation does not hold i К) Га J-shaped or an extremely skewed distribution. 


3.10 THE BOX PLOTS 


The Box plots, which are gra “я very simple, are based on the Median, a measure of | 
and the Interquartile Range (JOR), 4; ure of data's variability. They are informative and effective 
comparing two or more data se istributions. 


А box plot is cons: by drawing a rectangle (the box) with the ends (called the hinges) 
at the lower and upper quartes (О; and Q;). The median of the data is shown in the box usually by a 
sign. The straight lines (called the whiskers) are drawn from each hinge to the most extreme observati 
The entire graph is called a Box and Whiskers plot. If one whisker is longer, the distribution of data 
skewed in the direction of the longer whisker, The box plot given below represents the distribution] 
examination marks given in Example 3.13. 


BOX PLOT FOR DATA IN EXAMPLE 3.13 
4 


Min 2 Whisker Max 


0; о; 


0 30 40 50 60 70 80 80 100 
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When two or more distributions are to be compared by drawing box plots, the scale of 
nt is usually plotted vertically. Sometimes, two sets of limits, called ir inner fences and outer 
жге also used. 

RELATIVE MERITS AND DEMERITS OF VARIOUS AVERAGES 


Ел necessary to understand the merits and demerits of each one of the Сиш гә іп order that it 
appropriately employed. 


311.1 The Arithmetic Mean. The advantages of the mean are: 

It is rigorously defined by a mathematical formula. 

It is based on all the observations in the data. 

It is easy to calculate and simple to comprehend. ^^ *7 Nee 
It is determined for almost every kind of data. 


It is a relatively stable statistic with the fluctuations of sampling. is why it is universally 
It is amenable to mathematical treatment. e 
Q 


disadvantages of the mean are: © 
It is greatly affected by extreme values in the im e 

It gives sometimes fallacious conclusions. % 

їп a highly skewed distribution, the = an appropriate measure of average. 


If the grouped data have m mean cannot be calculated without assuming the 
limits. 


2 The Geometric Mean. иа of the geometric mean аге: 
It is rigorously defined thematical formula. 

Et is based on all o values. 

В is amenable to mathematical treatment in certain cases. 

% gives equal weightage to all the observations. 

із not much affected by sampling variability. 


Ж is an appropriate type of average to be usca in case rates of change or ratios are to be 
sveraged. 


tages are: 

fis neither easy to calculate nor to understand. 

В vanishes if any observation is zero. 

Оз case of negative values, it cannot be computed at all. 
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3.11.3 The Harmonic Mean, The advantages of the harmonic mean are: 
i) It is rigorously defined by a mathematical formula. 
ii) | Itis based on all the observations in the data. 
ш) It is amenable to mathematical treatment. 
iv) It is not much affected by sampling variability. 
v) Шізап appropriate type for averaging rates and ratios. 
The disadvantages of the harmonic mean are: 
i It is not readily understood. 
ii) It cannot be calculated, if any one of the observations is zero. 
іш) It gives too much weightage to the smaller observations. 
3.11.4 The Median. The advantages of the median are: 
i _ It is easily calculated and understood. 
ii) И is located even when the values are not capable @Ruantitative measurement. 
ш) It is not affected by extreme values. It can smputed even when a frequency @ 


involves “open-end” classes like those of i and prices. 
iv) Ina highly skewed distribution, бё: dn appropriate average to use. 
The median has the following disa 


i) Itis not rigorously defined 9 
ii)  Itisnot capable of tendis If to further statistical treatment. 


iii) It necessitates the ement of data into an array which can be tedious and time 
for a large body of йа. 


3.11.5 The Mogae advantages of the mode are: 
i) Itis simp defined and easily calculated. In many cases, it is extremely easy to 
mode. ; 


ii) Itis not affected by abnormally large or small observations. 
ii) И сап be determined for both the quantitative and the qualitative data. 
The disadvantages of the mode are: 

i) It is not rigorously defined. 

ii) It is often indeterminate and indefinite. 
iii) It is not based on all the observations made. 
iv) Itis not capable of lending itself to further statistical treatment. 

v) When the distribution consists of a small number of values, the mode may not exist. 
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EXERCISES 


УЕ 


Answer ‘True’ and ‘False’. If the statement is not true then replace the underlined words with 
words that make the statement true: 


A measure of central tendency is a quantitative value that tends to locate in some sense the 
middle of a set of data. 


The mean of a sample always divides the data into two equal halves — half larger and half 
smaller. 


For any distribution, the sum of the deviations from the mean equals zero. 
The arithmetic mean is not affected by the extreme values. 
А distribution always has exactly one median score, but it can have more than one mode 


scere. қ 
S 


The value that occurs most frequently is known as median. < 

Тһе mean may not exist for some sets of data. ev ( 

Another name for the median is the score at the thi ile. 

The Harmonic mean, the geometric mean and @ ithmetic mean are equal only if all the 
numbers Xi; X3,'..., Ха. are identical. ЬУ) 


. 
The Harmonic mean is the № root of uct of the numbers. 
The first quartile is also referred to 25" percentile. 


If a distribution of scores is s ic, then the median canes will be the same. 
If a distribution is positi then the mean is smaller than the median. 
If a distribution is зе left, generally mean > median > mode. с 
If the mean, median 994 mode of a distribution аге 5, 6, and 8, respectively, the distribution is 
positively skewed. 
MULTIPLE CHOICE QUESTIONS. 
Half the observations are always larger than the 


a) Mean b) Total c) Median : d) Mode 

The value that occurs most ofterí in a set of data is called the 

a) Mean . b) Mode c) Geometric mean d) Harmonic mean 
In case of an open-end class, 


a) A median cannot be computed. 

b) The arithmetic mean and the median will :!ways be exactly equal. 
c) A mean cannot be computed. 

d) The distribution is always positively skewed. 


іу) 


у) 


мї) 


үй) 


viii) 


ix) 


x 
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Which of the following ís a true statement about the median? 

a) Itis always one of the data values. 

b) Itis influenced by extreme values. 

c) Fifty percent of the observations are larger than the median. 

d) Itis the middle value of the data values. 

Which of the following is not a characteristic of the arithmetic mean? 
a) Itis influenced by extreme values. 

b) The sum of the observations from the mean is zero. 

c) Fifty percent of the observations are larger than the mean. 

d) The sum of the squared directions from mean is always minimum. 


Find the mean of the following sample of distances of stars from the earth: 
18.2, 56.9, 24.6, 13.5 


a) X = 28.30 b) X 24340 с) и=2830 S и = 43.40 
In a positively skewed distribution, ће mean is alway” 
a) Smaller than the median 1 to the median 
М) Larger then the median AAYY Equal to the mode 
The median is larger than the ari when 
a) The distribution is positivel 
b) The distribution is negatiygly skewed. 
с) The data is о: a a frequency distribution. 


d) The distribution jgafmmetrical, 
The geometric g the numbers 2, 4 and 8 is 


a) 3.67 b)4 c) 3.43 d) 5 
Which of the following statement is not true for Harmonic Mean? 
a) Harmonic mean is smaller than the mean. 

b) Itis based on all the values. 

c) Itis an appropriate type for averaging rates and ratios. 

d) И gives equal weightage to all the values. 


iat 


What is a statistical average? Name the important types of averages. Discuss the adv; 
and disadvantages of each average. (P.U., В.А. (Hons.) 1 


weasunss or cv RSHA Blogspot.com ” 


What is a measure of “central tendency"? What is the purpose served by it? What are its 
desirable qualities? 


What are the principal criteria for a satisfactory average? State giving reasons the 
circumstances in which it would be preferable to use (i) the mean, (ii) the median (iii) the 
mode, (iv) geometric and (v) harmonic mean. 


What criteria do you apply to judge the merits of an average? Discuss the merits and demerits 
of the different averages in common use with special reference to these criteria. 


In what circumstances would you consider the Arithmetic mean, the Geometric mean and the 
Harmonic mean respectively, the most suitable statistic to describe the central tendency of 


distributions? (P.U., В.А./В.5с. 1989) 
What are the different measures of central tendency? Describe the manner of computation of 
any three of them with suitable illustrations. (P.U., M.A. Econ. 1967) 


Define weighted average and explain how it differs from simple mean. Give the method of its 
computation юа discuss the use of weighted mean in Statistics. (P.U., М.А. Econ. 1974) 


What is the median? What are its advantages and disadvag@bes? Give reasons why the 
statistician usually prefers the arithmetic mean to the (P.U., M.A. Econ. 1981) 


Define, and explain how to compute, the following е for а grouped distribution: 
x0. Q3, D7 and Mode. ` 


Define the arithmetic mean, the mode and Ae the relationship of these three 
measures of location in a skewed e ds tate the. chief advantages of the arithmetic 
mean as a form of average. 
Define Mean, Median and Mode. are ik advantages and limitations in the analysis of 
data? Give various methods of calwliting Arithmetic mean, with illustrations. 
қ» (P.U., В.А./В:5с. 1958, 1960) 

а) Define Mean, меба Mode. Give an empirical relation between them. Does this 

relation give Фа е for the mode? 
b) Criticise the statements: 

i) An averag® does not reveal all the information about the data. 

ii) The median is described as the value of the average rather than the average value, 


Comment on the following statements: 


1) The depth of a river at four different points is 2, 7, 5, 6 feet respectively. The average 
depth is 5 feet. Therefore all the people with heights above 5 feet can cross it. 


ii) The average marks of one class of students are 30. Therefore every student is hopeless. 


iii) The average income of a king and his household servants is £20,000 per month, therefore 
all the household servants must be fabulously paid. 


iv) On ап average. the number of accidents occurring in the middle of the road аге 5 per 
thousand. The number of deaths at other places is 30 per thousand. Therefore, it is safer to 
walk'in the middle of the road. 


v) Ina country, 2,000 vaccinated persons died. Therefore vaccination is useless. 
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314 Define the mean, the median and the mode of a frequency distribution. 
It is commonly true that the median lies between the other two measures and is approxima 
twice as far from the mode as from the mean. State with reasons, whether you expect thi 
relationship to hold, and which of the three statistics is likely to be the most useful sing! 
Statistic for each of the following distributions: 


i) The annual earnings of employed males in Pakistan 
ii) The percentage of sky, to the nearest 10 per cent, covered by cloud at Karachi at mid-day 


in) The exact length of rods сш to a standard size by machine. 
(Р:С.5., 1971 


3.15 a) Define arithmetic mean and describe its properties. 


b) Ifthe arithmetic mean of n numbers ху, Xz ..., X, is M and A is any arbitrary number, 
show that 


(х; - A)? = L(x; -M) ^n(M- AY (P.U., В.А./В.5с. 1977, 82 


с) A distribution consists of three components with frequencies 3, 4 and 5, and havir 
means 2, 5.5 and 10. Find the mean of the combined distribution. (P.U.. B.A./B.Sc. 19 


3.46 а) State the properties of the arithmetic mean. ғ 
b) Show that X(x;-a)? = E(x; - x)? + nea ther words, ‘show that the sum с 
squares about а = x is smallest. | 
* c) A distribution consists of 3 c frequencies 45, 40 and 65, having th 
means 2, 2.5 and 2 respectively. it the mean of the combined distribution is 2. 
S approximately. 
3.17 а) The number of cars crossing) in bridge in a big city in 10 intervals of five 
We 


each was recorded as follo 


25, 15, 18 0. 12,9, 16, 15 
Calculate (1j the ari gòtic mean, (ii) the median and (iil) the geometric mean. 


b) Explain why calculated for a set of ungrouped data might differ from the 
if the same grouped into a frequency distribution. 
34% а) Define Arithmetic mean, Geometric mean and Harmonic mean; and prove that for ai 
two positive numbers а and b, м 
A.M. 2 GM. 2 НМ. (P.U., В.А /В.5с. 1991 
~) The monthly incomes of ten families in rupees in а certain locality аге given below: 
Family: жй Бл EYE (Gr 1 1 


Income (Rs.); 85, 70, 10, 75, 500, 8, 42, 250, 40, 36. 


Calculate the arithmetic mean, the geometric mean and the harmonic mean of the abo 
incomes. Which one of the above thrce averages represents the above figures best? 


No Calculate the arithmetic mean, the geometric mean and the harmonic mean of the annu 
incomes of fifteen families as given below: 


Rs. 60, 80, 90, 96, 120, 150, 200, 360, 480, 520, 1060, 1200, 1450, 2500, 7200. 


or смткл МИЫ 849423. ріо0зро1.сот 


A) In a company having 80 employees, 60 сат Rs.3.00 per hour and 20 earn Rs.2.00 рег 
hour. (i) Determine the mean earnings per hour. (ii) Do you consider this mean hourly 
wage to be typical? - (P.U,, B.A./B.Sc. 1980-5) 


wb) An examination candidate's percentages are: English, 73; French, 82; Mathematics, 57; 
Science, 62; History, 60. Find the candidate's weighted mean if weights of 4, 3, 3, 1, 1 
respectively are allotted to the subjects. 


= Find (i) the simple average of prices in column 2 and (ii) the weighted average, using the 
quantities in column 3-as weights, and explain the difference between the two results. 


а) (2) (3) 
Piece goods Price per metre Quantity 
Rs. millions metres 


Unbleached 
Bleached 
Printed flags 
Other sorts 
Dyed in piéce 
Of dyed ya 


19 126 114 100 88 62 77 89-7 103 108 
144 129 148 63 . 69 148 132 8 142 116 
1233 104 95 80 85 106 12249833 140 134 


The firm gave bonuses of Rs. 10, 15, 20, 25, 30 or individuals in the respective salary 
groups: exceeding 60 but not exceeding 75, g 75 but not exceeding 90 and so on upto 
exceeding 135 but not exceeding 150. Fi ee eae МА: 


Compute the average age of these horses (a) from the first two columns of the table by the 
xa! short method, (b) from the last two columns by weighting the group averages by the 
member of horses in the groups. Compare the two results, Which one is more nearly the real 
тестіде age? 
the arithmetic and geometric means of the series 1, 2, 4, 8, 16, .... 2". Find also the 
nic mean. (P.U. D. St. 1960) 


(i) arithmetic mean, (ii) geometric mean, and (iii) harmonic mean of the series 1, 3, 9, 27, 
53". (P.U., B.A./B.Sc. 1973, 82у 
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3.26 a) Define Geometric mean and describe its advantages and disadvantage. 
b) Given two sets, each of n positive values, хуу, хү. Xy :X2], X22,» Хән; Р 
the geometric mean of the ratios of corresponding values in the two sets is equal 
ratio of the geometric means of the two sets. (P.U. В A./B.Sc. 


Hint. Let a ratio be defined as X =A, 
2 


Then log X = log X, — log А; 
Sum for all pairs of X,'s and Ж, 5. 


G 
Hence С = —-. 
ence G 


2 


"327 А man gets a rise of 10% in salary at the end of his first year of service, and further 
20% and 25% at the end of the second and third years respectively, the rise in each cas 
calculated on his salary at the beginning of the year. To what annual percentage inci 
this equivalent? 


„3.28 а) Define Harmonic mean. How does it differ fi 
advantages and disadvantages? 

7) А man travels from A to B at average speed of, iles per hour and returns from 

along the same route at an average speed of» miles per hour. Find the average 

the entire journey. (P.U., B.A./B, 

~¢) Find out the average speed of perso rides the first mile at the rate of 8 

boo, the. pent кес at the tate АҚҚ es an hour and the third mile at the 

miles an hour. (P.U., B.A./B. 


“ 

329 a) A bus traveling 200 коп has 10 stages at equal intervals. The speed of th 
the various stages was ей to be 10, 15, 20, 25, 20, 30, 40, 50, 30, 40 kilome 
hour. Find the av: at which the bust travels. 

b) Find our the of (i) motion in the case of a person who rides the first 
the rate of 10 ап hour, the next mile at the rate of 8 miles per hour, and 
mile at the ӘҮ 6 miles per hour; (ii) increase in population, which in the first: 

i %, in the next 25% and in the third 44%.  (P.U., B.A. (тй 


eqüency 


3.32 Find the mean or median, whichever you think more suitable, in each of the following” 
i) Salaries of 5 men in an industrial concern: 
Rs.950, Rs.2100, Rs.1500, Rs.100, Rs.10,000. 
ii) Heights of 6 boys: 64", 65", 65", 66", 66", 67". 
iii) Handicaps of four golfers: 4, 18, 18, 20. 
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The following data relate to sizes of shoes sold at a store during a given week. Find the 

enmt the shoes. Also calculate the quartiles, the 7th decile and the 64th percentile. 
[SizeofShes | 5 | 54 | 6 | 6% [ 7 [7и | 8 | 84 | 9 | 95 | 
п 12 1+ [з [%[ юрю RO S 


Find paces value of the ст from the Ogive and check your answer by calculation. 
(B.LS.E. Sargodha, 1969-5) 


Estimate grgphically and by formula the median and quartile ages of head of household from 
the following distribution: 


A 65.0-, 67.5-, 70.0-, 172.5- 
412 127 38 
\ кт (P.U., М.А. Есоп., 1969) 


Explain when median is more representative than mean. Calculate the median of the following 
distribution. 


Class Number Class Number. Class Number 


100-104 E 124-129 298 150-154 260 
105-109 14 130-134 380 155-159 128 
110-114 60 135-139 450 160-164- 66 
115-119 138 140-144 500 165-169 28 
120-124 236 145-149 430 170-174 12 


(P.U., B.A/B.Sc. 1960) 
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frequency ag iat Fe ersons according to below: 
Pipe erre 55 [1-1 [ 20:29 | 30-39 | 40-5 | 5 
[No.ofpersons | 5 | 10 | n] 12 | 22 | в | 8 | 7. 


Calculate the Mean and the Median ages of the distribution. 
340 а) Describe the merits and demerits of mean and median. 


1 b) Calculate the median, the upper and lower quartiles from the following data: Also draw 
box plot. 


(0 


ІСІ ТІСІ | ха [н | эн ы [о [лк [зл [тез 


Estimate the mean, the median 


3.42 The yields of grain ( x /b) 
interval (0.2 Ib.) in the 


MON small plots are grouped in classes with a common cl 
low, the values of x given being the midvalues of the cla: 


Sank дық ibution is 3.95 Ib.; the median is 3.95 Ib.; and quartiles are 
Ib. and 4.28 Ib. 
= eer сет. E EN 
8 4 42 . 69 
м 3.0 15 44 59 
32 20 46 35 
34 47 4.8 10 
3.6 63 5.0 8 
3.8 78 Н 5.2 4 
4.0 88 Total 500 


7343 


100 114.9 


115 - 129.9 
130 – 144.9 
145 – 159.9 
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‚ а) Find the average weight, the median weight and the most common weight ( mode) of the 
eds. 


se 
bog Find the first and third quartiles, Find the third decile and the 45th percentile. 
c) Explain your answers as you would to a person who had never studied statistics. 


144 na group of 500 wage-earners, the weekly wages of 4% were under Rs.60 and those of 15% 
were under Кө,62.50, 15% of the workers earned Rs.95 and over, and 5% of them got Rs.100 
and over. 


The median and quartile wages were Rs.82.25, Rs.72.75 and Rs.90.50; the fourth and sixth 
decile wages were Rs.78.75 and Rs.85.25 respectively. 


Put the above information in the form of a frequency distribution and estimate the mean wage 
of the 500 wage-earners therefrom. 


Hint. First put the information in the form of a cumulative frequency table, 


a) Describe the advantages and disadvantages of the mean, the median and the mode. 
Explain the empirical relation between them. 


b) The weight of the 40 male students at a university are gi 
table: 


in the following frequency 


Calculate the mean, median and the mode. by (P.U., B.AJB.Sc. 1969) 
The Pod ahi. ne shows the aaa the maximum loads in short tons supported by 


Rete of employees | 3 |13 [з | 102 | 175 [220 [2s [13 [oo [25] 6 [ 1 | 
Calculate the Modal and Median wages and explain why there is a difference between the two. 


з) Define the mode of a frequency distribution. How does it compare with other types of 
averages? ү 

=) Write down the empirical relation between mean, median, and mode for unimodal 
distributions of moderate asymmetry. Illustrate graphically the relative positions of the 
mean, median and mode for frequency curves which are skewed to the right and to the 
left. (P.U., B.A./B.Sc. 1972, 80-5) 


e For a certain frequency distribution, the mean was 40.5 and median 36. Find the mode 
approximately using the formula connecting the three. 
(P.U., В.А./В.ӛс. Optional, 1971-S) 


| 349 а) 
| 

b) 

3.50 а) 

b) 
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What types of averages would be suitable for the following cases? Give reasons. 
i) Size of agricultural holdings. 

ii) Heights of students. 

iii) Marks obtained in any examination, 

iv) Income of workers in a factory. 

v) Percapita income in Pakistan. 

vi). Comparison of intelligence. 

vii) Volumes of sales of ready-made shirts, shoes and collars. 

viii) Number of petals of flowers. 


What measures of central tendency would you recommend for the following cases? 
reasons in support of your answer. 


i) Symmetrical Distribution. 
ii) А J-shaped Distribution. S 
ii) Distribution having “open-end” classes арба of the classes. 
iv) Frequency distribution of a quantitativ&Yariable. 

& , (P.U., М.А. Econ., 
A distribution хү, ху, =- Xp, n NO With frequencies fi, f2..... frs f. is tra 
into the distribution X4, X че к, With the same corresponding frequencies 


relation. X, -ах, +h, а and b are constant. Show that the mean, 
median of the new dj tion are given in terms of those of the first distribution 
same transformati 


A distribution djs values of the variable x;,x5,..,x; with corresponding : 
fifi «S new distribution with the same frequencies is formed by 
X, 7 2283 for values ofr (r= 1,2, ..., 9. If the values of the mean, median and 
of the original distribution are а, b and c respectively, what are these values of 
distribution? (P.U., B.A/B.Se 


9999999999 
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4.1 INTRODUCTION 


lt is quite possible that two or more sets of data may have the same average (mean, median ог 
mode) but their individual observations may differ considerably from the average. Thus a value of central 
tendency does not adequately describe the data. We therefore need some additional information 
zencerning with how the data are dispersed about the average. This is done by measuring the dispersion 
Sə which we mean the extent to which the observations in a sample or in a population vary about their 
mean. А quantity that measures this characteristic, is called a measure of dispersion. scatter or 
weriability. Tt is desirable to have the measure of dispersion (i) in the same units as the observations, 
(2) zero when all the observations are equal, (iii) independent of origin, (iv) multiplied or divided by a 
constant, It is also desirable that it should satisfy the conditions similar to those laid down for an average 
x previous chapter (see section 3.2). 


EET Vier М шысы зыбыз 


For example, if the units of the data are rupees, metres, ms, etc., the units of the 
gus заа ela 


d in Ша д M 
re s use comparison of data of different . А measure of central tendency 
her with a measure of. dispersion gives an adequate f data, 
The main measures of dispersion are the following: E 
) — TheRange. 


i) Тһе Semi-Interquartile Range or the Оз Deviation. 
mi Тһе Mean Deviation or the Average 
m) The Variance and the Standard 


THE RANGE де 


The range R, is defined as rence between the largest and the smallest observations in a set 
data, Symbolically, the ran én by the relation 


Ё=х„ — Xp, 


ете x, Stands for the largest observation and xq denotes the smallest one. When the data are grouped 
5 а frequency distribution, the range is estimated by finding the difference between the upper boundary 
Фе highest class and the lower boundary of the lowest class. The range cannot be computed if there are 
open-end classes in the distribution. 


The range is a simple concept and is easy to compute. It has, however, two serious disadvantages. 
*. it ignores all the information available from the intermediate observations; and second, as its value 
esed only on the two extreme (unusually large or small) observations, it might give a misleading 
sere of the spread in the data. It is therefore an unsatisfactory measure of, dispersion. However, it is 
opriately used in statistical quality control charts of manufactured products, daily temperatures, stock 
mes. etc. This is an absolute measure of dispersion. Its relative measure known as the co-efficient of. 
version, is defined by the following relation: 

Co-efficient of Dispersion = Ха 20, 

X. X; 


This is a pure (i.e. dimensionless) number and is used for the purposes of comparison. 
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Example 4.1 The marks obtained by 9 students are given below: 
45, 32, 37, 46, 39, 36, 41, 48, 36, 
Find the range and the co-efficient of dispersion. 
Here the highest marks, i,e.x,, = 48, 
and the lowest marks, je, xo = 32. 
Е-Х,-Х0 =48— 32-16 marks, and 
ER. 
Xm +X 


44835160 
48:32 80 


4.3 ТНЕ SEMI-INTERQUARTILE RANGE OR THE QUARTILE DEVIATION 


The interquartile range is a measure of dispersion, defined difference between the third and 
first quartiles; and half of this range is called the semi-inte ile range (5.1.О.К.) or the quartile 


deviation (Q.D). Symbolically, we have 
© 


Co-efficient of Dispersion = 


gn. - 9x80 


where О and Q; аге the first and the third Of the data. The quartile deviation has an attractive 


large or small observations. It is simple to understand 
ges. It gives no information about the position 
Observations lying outside the two les; is not amenable to mathematica) treatment and їз 
affected by sampling variability, quartile deviation is not as widely used as other measures 
dispersion. It is, however, жз in situations where extreme observations are thought to 
unrepresentati ve. ҳу 

The quartile deviation is also an absolute measure of dispersion. Its relative measure called 
Co-efficient of Quartile Deviation or of Semi-Interquartile Range, is defined by the relation 


superior to range as it is not affected by 
and easy to calculate, It has certai 


Co-efficient of Quartile Deviation = 05-01. 
Q3 +O, 


which is a pure number and is used for comparing the vanation in two or more sets of data. 


Example 4,2 Find the quartile deviation and the co-efficient of quartile deviation for (i) the data 
Example 3.11 and (ii) the frequency distributio; in Example 3.13. 


i) Using the data of Example 3.11, we find that 
О = 36 marks, О; = 45 marks, and therefore 


45-36 
ш------- 
2 


QD. 4.5 marks 


xs or pisecnsfattoe sta tQ 943. logspot.com 


45-36 9 
Coeitiiont of ор 22236 2 oat 
аа neces SESS Ml 


Values of Q; and Q; calculated in Example 3.13 are respectively 56 and 74 marks. Therefore 


4 
OD.= Q,- оез ы 
Co-efficient of Q.D. ATASS tu: = 0.14 
74456 130 


MEAN (OR AVERAGE) DEVIATION 


The mean deviation (M.D.) of a set of data is defined as the arithmetic mean of the deviations 

either from the mean or from the median, all deviations being counted as positive. The reason 

Фе deviations as positive, i.e. to disregard the algebraic signs (+ and —) is to avoid the difficulty 

Som the property that the sum of deviations of the observations from their mean is zero. The 

definition of the mean deviation from the mean is 

<: Ejs =y] 
n 


M.D. . for sample data, 9, 


Eju- t e 
M.D.— zy IU. for population data, SS? 


2-1) and |x, - д (pronounced "mod. de ыў, ) indicate the absolute deviations of the 


from the mean of a sample and б; respectively. It is more appropriate to call it the 
е deviation (M.A.D.). 


the data organised into a gro uency distribution having k classes with midpoints 
+ and the ее бер cies /|,/2-. /a(X f; =n), the mean deviation of the 


gen by 


up. - Ef ше META 


Seviation is also defined in terms of absolute deviations from the median in a similar way. 
us that the mean deviation is /east when the deviations are measured from the median. But in 
=з generally calculated from the arithmetic mean. The mean deviation gives more information 
or the quartile deviation as it is based оп all the observed values. It is easily calculated and 
As it is not amenable to mathematical treatment, its usefulness is limited. We 
artificiality i in its calculation by ignoring the algebraic signs of the deviations and this step is 
Пу defensible. As the mean deviation does not give undue weight to occasional large 
30 it is used in situations where such deviations are likely to occur. It 1s unsatisfactory for 
mierence. A 


deviation is an absolute measure of dispersion. Its relative measure, known as the co-efficient 
ion, is obtained by dividing the mean deviation by the average used in the calculation of 


 https;//stat9943.blogspgt СД ro scmsnicar 


M.D. M.D. 
or 


Mean Median 


Example 4.3 Calculate the mean deviation from (i) the mean, (ii) the median, of the fo 
of examination marks: 


Co-efficient of M.D.— 


45, 32, 37, 46, 39, 36, 41, 48 and 36. 
Also calculate the co-efficient of mean deviation. 


We first arrange the given marks in an increasing sequence to find the median. The ord 
are 


32, 36, 36, 37, 39, 41, 45, 46, 48. 


Median = Marks obtained by (% + фр student in ordered data as 2 is not an integer. 


Marks obtained by (В + 34 ie. Sth student 


= 39 marks : © 


8 
4 
4 Я 
3 
1 
1 
5 
6 
8 
M.D. VD. (әт meas) = ZELE] 30 44 maris À 
25 ; 
and · М.Р. (from median) = ХЕ зы marks 
n 
Co-efficient of M.D. = ЖА ог ы 
X median 
44 


-tf a Hon or 0.11 Е 
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«ryan «s Calculate the mean deviation of the following frequency distribution showing the 


E ee er EA 


кс TT AL ts Pa | Лк -Я 
BUE INE M 


165 — 184 174.5 208.0 
185-204 | 1945 | 5 
[. Toa |. - | $0 [73500 


E Ex; 7350: 
X fe zg 71225 grams Y 
Eb -x| 16960 5, S 


VARIANCE AND STANDARD DEVI SN 


^ ` e = е 
‘set of observations is ебад the mean of the squares of deviations of all the 
n their mean, When it is calcul ‘om the entire population, the variance is called the 
variance, traditionally denoted (с is the Greek lowercase “sigma”). If, instead, the 
the sample are used to calcu variance, it is referred to as the sample variance and is 


52 in order to distinguish n the two. The symbolic definition for variance is 
қз 
IDE. for population data, 


D(x; -х)? 
n 


Ss , for sample data, 


се is also denoted by Var(X). Тһе term variance was introduced in 1918 by К.А. Fister 


d be noted that the variance is in square of units in which the observations are expressed 
эсе is a large number compared to observations themselves. The variance because of its 
ematical properties, assumes an extremely important role in statistical theory. 


ation. The positive square root of the variance is called standard deviation, Symbolically, 


112 
IIS. for population data, 
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‚ for sample data, 
n к 


The standard deviation is expressed іп the same units as the observations themselves and is а 
the average spread around the mean. Karl Pearson (1857-1936), “founder of the science of 
credited with the name standard deviation, the most useful measure of dispersion. The sample 
some texts is defined as 


229-9»! 
п-1 


where л is replaced by л — 1 on the basis of the argument that knowledge of any n — 1 
automatically determines the remaining deviation as the sum of n deviations must be zero. This 


an unbiased estimator of the population variance c?, the explanation for which is deferred to 


X30 -x) 
n 


estimation where we shall leam that sample variance 5 £ » for small 
% 


underestimates the population variance с?. о 


e 

When the data are grouped into a frequency digfibution having А classes with 
Х1,Х2»-2,Хұ and the corresponding frequencies fibres Kk (2, f; = п), the sample variance 
deviation are given by 


NS by 
5? A LM x) Р алдуу? 
9 
E fi(xi > 


It should be noted à frequency distribution, as the number of observations or 
frequency n is usually p viding the sum of squared deviations by n-1 is practically 
dividing it by 7. 


The standard deviation has a definite mathematical meaning, utilizes all the observed 
‘amenable to mathematical treatment but is affected by extreme values. The standard devi 
absolute measure of dispersion. Its relative measure called coefficient of standard deviation, is 


Coefficient of 5.D. Standard Deviation 
Mean 
‚ Xx -ay ) Р 2.3 
The quantity wr where а is some arbitrary origin, is called the root- 


deviation which becomes the standard deviation when this arbitrary origin coincides with the 


To calculate the variance and standard deviation on an electronic calculator, the 
formulas for use are obtained by showing that X(x, — uy = Ух} 2 (Xx IN. 


or pisetip SA siat? ads. biggspot.com өз 


Now X(xj-u) = У(х2 -2х,и+ и?) 


-Ха?-2шХх + Ми? 

= д2 -2№и? +Ми2 (ти Ru 
2 2 2 (Хх)? 

=x; -Nu = Ух} T 


Thus the sum of squares of the deviations from the mean is equal to the sum of the squares of all 
тшпив а correction factor which is the (1/N)th of the square of the sum of all x;'s . 


а-и У) (x) 
N N N 
‘ 
variance is the mean of the squares minus the square of the теа phe corresponding formula for 
variance is o 
© 
Ф 

СА 
AS 


ive formulas for standard deviations are 


bu 


alternative formulas for the sample variance and standard deviation of a frequency 


are obtained in a similar way. 
sZ Т; d 
n n 


Ур? ZA) 
n n" J.- 

е 4.5 A population of N = 10 has the observations 7, 8, 10, 13, 14, 19, 20, 25, 26 and 28. 
variance and standard deviation. 
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Calculations appear in the following table: 


2 
and „ЕЁ n) "m 731 


Using the alternative тең 


2 Жы 
с°=——-— a 
Ж Ca) 


2 < 
= -H24 (10) = 342.4 -289 = 53.4 


2 
0 oa 32285 (Qu 24534 - 731 


N 


Example 4.6 Calculate the variance and standard deviation from the following marks о) 
9 students. 


45, 32, 37, 46, 39, 36, 41, 48, 36 
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The variance 82 and the standard deviation 5 for the sample are calculated as below: 


F 2 
S= ZaD = 4/2578 = 5.08 marks. e 
the alternative method. . В s 
i у? 
g.Ex x) 


n n 


Rus os 78-1600 = 25.78 (пайа 2 


Example 4.7 Calculate the variance and standard deviation from the data of Example 4.4. 
O necessary calculations па be carried out on an electronic calculator as below: 


49 952.25 

89 302.50 

t 222 874.25 

= : ) 180 902.50 


4. 119 351.25 
5, s ij 121 801.00 
189 151.25 


LE ЕШ ЕНШЕ: 1 ШЕ: ЕШ - 
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2 тал? 
Thus we find s? ХА (ЖА) 
n п 
2 
= жып Im -16222.25-15006.25 
60 60 
= 1216(grams)? 
2 2 
and з= ZA (ха) = 1216 =34.87 grams 
п п 


4.51 Change of Origin and Scale. The computational labour can n be reduced by using the 
transformation as was used for computing the arithmetic mean. 


Let «22. x, =a+hu and ғ-азф 
9 


Therefor- Ex, -3)* = Yla *hu))-(a* B) С” 
2,2 m S 
h Xu; и) 45% 
Qu) „ҸӘ 
п 


Further X(u; -uy = Xu? - 


Hence 5? = 


Xe-3 a) ару с» Quy. 


This gives us a short method for hand calculations. 


When the data are grouped into a frequency distribution, the corresponding short method fiy 
calculations, is 


where h is the width of the class-interval, f, is the frequency of the ith class and и; ts the deviati 


from an assumed mean in terms of class intervals This method is also known as the. step: 
method or coding method. 
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Example 4.8 Find the standard deviation by the short method from the data of Example 4.4. 


-114.5 2 
Let u; M ‚ where a -114,5, value corresponding to the highest frequency, and л = 20, 


class-interval. Then и; = -2,-1,0,1,2,3,4. Other calculations appear below: 


фе Ey 3 


n п 
2 


92 
+ 20x 92-00) =20х gus 
60 (60 қ 
=20х 43.04 = 20x о-на grams. 


4.5.2 Interpretation qas Deviation. The standard deviation (с or s) has not a simple 
tion like the arithmetic mean (догх) that is interpreted as the balancing point for the 


ion. The standard deviation is a very important concept that serves ав a basic measure of 
ility. A smaller value of the standard deviation indicates that most of the observations in a data set 
“юзе to the mean while a large value implies that the observations are scattered. widely about the 
However, a connection between the standard deviation and fraction of data included in intervals 

ted around the mean, was discovered by the Russian mathematician P.L. Chebyshev 
-1894) This result, generally known as Chebyshev's rule, is stated below: 


"ог any set of data, the interval x —ks to x - ks, where k is any number greater than 1, contains 


ЖЕЕ UM. m E ры { : 
f the fraction: h --- | of the data," For example, the intervals ¥+2s and х t3s will contain 
) | 


к? 


“ 


vely at feast the fractions f ез} pi and fi- Hi | 55 of the data 
+ 272 М ЖА 5 a 8 AE 
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This rule is applied to any distribution (Population or Sample) and guarantees the inclusion 
minimum fraction of the data in the constructed interval whereas the actual fraction of the inc 


(especially in bell-shaped distributions) will exceed ( к x) ) , 


4.5.3 Co-efficient of Variation. The variability of two or more than two sets of data cannot 
compared unless we have a relative measure of dispersion. For this purpose, Karl Pearson (1857-1 
introduced a relative measure of variation, known as the co-efficient of variation, abbreviated С.У. 
expresses the standard deviation as a percentage of the arithmetic mean of a data set. Symbolically, 
defined as 


С.У. EE 100, for sample data. 
x 


=2x 100, for population data, 
H . 


% 
As the coefficient of variation is а pure number without units, it is Феде used to compare ће 
in two or more data sets or distributions that are measured in diffe: its, e.g. опе may be те; 
hours and the other in kilograms or rupees. A large value of C. VXjpdicates that the variability is great 
a small value of C.V. indicates less variability. S 


The coefficient of variation is also used to со performance of two candidates or of 
players given their scores in various papers or smaller the coefficient of variation the 1 
consistent is the performance of the candidates s. Thus it is used as a criterion for the co 
performance of the candidates or the. players. aperui be noted that this co-efficient is unreliable 
the arithmetic mean is very small. © 

Exantple 4.9 Using the co-e of variation, determine whether or not there is 
variation among the prices of echas lar commodities given, than among the life in hours under 


Prince in Rupees: S 9 8; 23. 30 
Life in hours: SS 150, 180, 250, 345. 


We have to compute 2 mean and the standard deviation for each set so that the corre 
coefficient of variation can be obtained. The necessary arithmetic is shown below: 


[Pismo | Lenn | 


ML 
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of C. i Life in Hours 
F-R T - кава p. 211 hours 
1986 (92 ў 253325 [es Y 
;-J——-|— Sy = КЕ ред 
5 92. 5 5 ) 
= /397.2-338.56 = 50665-44521! 
2458.44 = Ёз. 7.66 = 4/6144 = 78,38 hours 
су. = 256119941 63% ON = 78.38 100 = 37.15% 
18.4 4 211 
We see that the co-efficient of variation for the prices of commodigigs (X) is larger than that for the 


= hours (Y). Hence the prices of certain similar commodities a wing preater variation than that 
the life in hours under test. KS 
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TL 


Ex (ua | 


п n 


2 
- ІШЕ) = 41.7138 -1.308 


_ 1,308 


5 х100= x100 = 123.4% 
X 1.06 


Team B: 


5 1% в [а] as 1308 


Thus iu бы 109.0% 
1.20 


We see that the co-efficient of sion for the team B is smaller than that for the team A. Hence 
is more consistent than team 


4.5.4 Properties of Variance and Standard Deviation. The variance and standard deviatios 
the following useful and interesting properties: 4 


1) The variance of a constant is equal to zero. If a is any constant, then 
Мага) = да -а) (mean of a constant is constant itself) 


-0 


i The variance is independent of the origin, i.e. it remains unchanged when a constant ё 
to or subtracted from each observation of the variable X. Symbolically, 


Var(X + a) = Var (X) 
/ 1 » X(x; +a) 
Now Var(X+a) = 204 *a)(u-a) (ғ BT -H*a) 
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= E-n)? = Var(X) 


Hence Var(X) is invariant to change of the origin. 


=) Тһе variance is multiplied or divided by the square of the constant, when each observation of 
the variable X is either multiplied or divided by a constant. 


Var(aX) = xia -ад)? 


ку, E(x; - uy 


2 , 
та” Var(X 
N a y ) 


This may also be interpreted as that the variance increases by a? when the scale of X is changed by 


w) Тһе variance of the sum or difference of two independent variables is equal to the sum of their 
respective variances. 


1f X and Y are two independent variables, then 


Var(X £Y)- Дог +0) Ursy)? 
Ка 


-Xe = Hy) ti - My) E 
1 2 1 2 2 
= 20474) go 1-4) £o EG - MO, 7 A) 


X, 
The quantity оч ЕЛ УЗ, is called the covariance and is denoted by Cov (X, Y). We 


show at some later sise R covariance of two independent variables is equal to zero. Thus we 


1 
= Hx) *x EO: - uy)? 
= Var(X) + Var(Y). 


If k subgroups of data consisting of Nj, М,.....‚ №, (E N; = N) observations have respective 


means /4,45,..,44 and variances aids ob, then the variance c? of the combined 
observations is given by 


а? -XEN(G +D?) i-L2..k 


where D; = и; = ш and u is the mean for all the data. 
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Let for the ith subgroup with mean ду, и, the general mean, be considered as an arbi 
origin. Then the sum of squares of deviations of the observations in the ith subgroup from 
is given by 


N, Ni 
Ха; -4)} = Yi - ui * ui - uy. 
j 1 
P 
= уо, су “М.Ш; —и)?, (^ product term vanishes) 
1 
= Njo? + ND? 


= Ni(o? +D?) 


But the variance е? of the combined observations is the mean of the sum of the deviations of. 
observations in k subgroups from the general mean p. Hence summing over k-subgroups, we get 


Nc? - X N;(o? * Dj) SS 
It is relevant to note that all these properties are valid for stan: feviation (S.D), which is the posit? 
square root of variance. In other words, S 
i S.D.(a)=0. $5 
AS 


ii) — SD. (X*a)-SD (X) 


iii) SD. (aX) = |a| S.D. (Х), as S.D. qu Sis 


iv) S.D. {X X Y= — a 


v) ЕТ Ao? +D} N 


For sample data, the results may be obtained in the same way 


Example 4,11 Let Sha s: be the mean and variance respectively of лу observations, % 
s? be the mean and variance respectively of т observations аты if the У АЗА of 
observations is 52, prove that € Se 
mS? *n5$2 NELLE 


52 = 3 
ny + ny (т +n) 


(5 -%)* (P.U;, В.А./В.5с. 1 


Let X ‘denote the general mean and be regarded as an arbitrary origin for the set of лу 
and set of n; observations. Then the variance of all m +n, observations, by definition, is given by 


в. 


1 afe 2 
> (x-7) 
Unt) ы 
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тт ža- -DE T 


ial іст 
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izmH 


TUE Liens x)? 203 (x; d 


і [m (SQ G1 —х)?}+›{52 +(х› -х)?}] 
+п) 


2 т т 2 
= ny Sj *n2$2 4 i8 -x)? жлҙ(Х; -х) 
n n, ntn 


Since x is the mean of all n; 4n; observations, i.e. xot M 


т +n 
pug emit ters UNO $e ur т 
n + ny n +n: (S) 
UE P. © 
%)-¥=%)- nX %п2Х2 25 -т(х -Х;) F ev 
n n; n +n S 
s 
substituting and simplifying, we get S 
g2 = mS? ems. omm 8 аж 
n +n2 rs emt 
45.5 Standardized Variables. A e is defined to be standardized or in standard units if it is 
in terms of deviations from į AE UG By a saco сечена доа Ene 2 
this means that © 
a 
Zi- i $4 Т 
с 
2; P aaa » for sample data. 


Тыз is a very important concept in advanced statistics as the mean of a standardized variable is 
zero and its variance is equal to one. Thus 


2-34). 1 XG 74) 6 ; 
N c c N 
2 
eed Varzy- x31 0 
N с 


Xx; - д)? 


au 
AA 


E ONES: 
€: = | 
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The Z-values, being independent of the units of measurement, provide a basis for со 
between individual values, even though they belong to different distributions. That is why they ure о 
used in psychological and education testing, where they are known as standard scores. The negati 
numbers are avoided by multiplying the Z values by 10, an arbitrary 5.D., and adding 50, an arbi 
mean, to them. The values so obtained are called the standard Z scores. Thus a standard Z score is gi 
by the relation 


z «soi —) 
eS 


4.6 TRIMMED AND WINSORIZED MEASURES 


Data sets often contain extreme (unusually large or small) observations which may be 
different from the main body of the data set and may seem to be incorrect. Such extreme observations 
generally called Wild observations or Quiliers. These outliers can cause problems. In the. presence 
outliers, the mean and the standard deviation, being affected by the extreme observations, are th 
misleading measures of central tendency and variability. The appropriate measures then may be 
Median and the Interquartile Range, which are much less sensitive to extreme or wild observati 
However, it is important to examine a data set for outliers and if ргезе should be excluded- 


For this purpose, we either remove a certain percentage fhe smallest and largest observati 
get the so-called Trimmed data set or replace trimmed yalues@ those next in magnitude to obtain x 
is known as Winsorized data set (proposed by С.Р. Winsor, mean and the standard deviation об 
data sets are known as the Trimmed Mean and the T) ed Standard Deviation, and the Wi 
Mean and the Winsorized Standard Deviation 


Generally, the Trimmed mean is obtaine 
below the first quartile and all observations 
from the modified data set obtained by 
of the first quartile and each observati 
Trimmed standard deviation and 
set and Winsorized data set as 
recent years as they are not di 
almost as good as the corres} 


Example 4.12 Calculate the trimmed and Winsorized means and standard deviations for 
given in Example 3.11, 


The data ordered from smallest to largest and the two quartiles were found to be 
32, 36, 36, 37, 39,41, 45, 46, 48 
eT t 
Qi Q; 


To find ёс trimmed mean and the trimmed standard deviation, we remove the two ol 
and 36 below the first quartile and the two observations 46 and 48 above the third quartile. Thus 
five observatioris 36, 37, 39, 41, 45 as trimmed data set 


Tried kan ee НЫН АУНЫЙ д 


the data set after having removed all obs 
the third quartile. The Winsorized mean is cal 
ing each observation below the first quartile with the 
ve the third quartile with the value of the third quartile. 
orized standard deviation are computed from the trimmed: 
trimmed and Winsorized measures have gained i 
ed by the presence of a few wild observations and have been 
measures in symmetric distributions with no unusual observat 
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TERN бел +.” (ey 


5 5 


= 41598.4 – 1568.16 = 30.24 = 5.5 


End the Winsorized mean and standard deviation, we replace the two values 32, 36 below the first 
Де with 36, and the two values 46, 48 above О; with 45 to get the Winsorized data set as 36, 36, 36, 
33, 41, 45, 45, and 45. Thus NOH 


t LEONIS 
the Winsorized mean = >м. = = 40, and еЗ м 


Co 
Ext. (2) 14534 (360)? ! 
the Winsorized S.D. 2d 5 мем (3%) = 
VU 
лац PUTA Se nha 
о - 


MOMENTS ы 
А moment designates the power to which deviations аге iu before averaging them, e.g. the 
y Xie — и) is called the first population moment, ES denoted by ду. Similarly, the quantity 


А -uy is called the second population "очуы denoted by 4. The corresponding sample 
are denoted by m; and mz. In general rth moment about the mean is the arithmetic mean 
power of the deviations 9E the AV from the mean. In symbols, this means that 


- xia- и)", for qeu data. 


= уа -%) Ж sample data. 


‘These moments are also called the central moments or the mean moments and are used to describe 
data. 
s similar way, moments about an arbitrary origin, say a, are defined by the relation 


П 1 r ^ 
Hyp = ү?“ ~a) , for population data. 


1 
m,-2-—X(x;- а)", for sample data. 
n 


put г = 0, we see that 
Mo = цо =1 and ту=ту=1 
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For +r=1, we have 
\ Ex 
ш =a еч п ере р-ра0, and 
n = EH EE 
The corresponding sample results are т) = 0 and mj = -а. 
Putting r = 2 in the relation for mean moments, we see that 


m= iy x, - u)? = o^, which is the population variance, 
and m, = Уа; - X) «57, which is the sample variance, 
n 


а LI 
When а = 0, the moment m; = Lx is called the rth moment «дом бу». 


IN) 
The moments about the mean or about the arbitrary origig бе also called the power moments. 


When the sample data are grouped into a frequenc ibution having k classes with 
XpX2,4, and the corresponding frequencies fi fem (Z f; =n), the rth sample moi 
given by AS 

by 


m, ӘЛДЕ and өз 


9 
m, -irf -a) SS 


4.7.1. Moments about ean in terms of Moments about the arbitrary origin, say 
conversely. It is easier to С the moments in the first instance, about an arbitrary origin. 
then transforméti to the ments. This is done by using the relationships obtained as follows. 


By definition, the rth sample moment about the mean is given by 
т. = TX fij - X 
The quantity within brackets may be written as 
(x; - X) = (x; -a +a - X) = (x; - a) -(X - a) 
=D, - mj where D, = (x, а) and mj -(Х-а) 
Thus, we have m, = Lf, =m)" 


By means of Binomial expansion, we have 
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m, -lr fDi -for “т «(er “2 Cmi)? +. (23) (nD] 
"ы and r!- r(r - lr -2)..3x2x1. 


1 1 P 1 - , 
m, ЕЛЫ {ips fD! mi {> f,DE (mt)? = e AY (т) iy f 


=т (un Gram ++ (71 (ту 


7- 1,2, 3 and 4, we get Муус 240 


, ] Wy is YM 
m =m] =m; = 05 BEN еді y 2 
Й 2 5-20, 2 Tr Yn Wy \ К ЖУА 
тә = т) – 1 т.т + 2 (mi) mo S 5 Aer "^^ 
2 coa 7. баһа 
=m) -(т{) v "ы 3 “ж қу” M 
тз = ms - 3mm] + 2(m| ^, and | S укм н ЖАД us 
Д * + , t 4 ` 
ma = ту —4т%ут\ + 6m (mp)? -3m* qo OR Э 


4-0 9 
иу = (ut): ` D 
ду = ду 39 p] ZU Ys 

Ha = и} - изи + 643 (ug ХА )* 


be noted that in «дЕ these relations, the sum of the co-efficients of various terms on 
side equals zero a¥@beach term on the right is of the same dimension as the term on the left. 


, the rth sample moment about an arbitrary origin, a, is given by 

n, - Ly fn =a)" 

== f, -¥+X-a)! 

=< E fild; +ту, where d; =хү-Х and mj =¥-a 

-LYfd[ {мч ұлар! Қо» улаг Ул. 


=m, “Oe (mj) [= (ті)? +. (mi) 
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Putting r = 2, 3 and 4, we get 
т =ту+ (mp? 
ту = ту * 3m; (nj) (ті)? 
ту = ms + 4mm (m{) + 6m (mi)? +(m} М 
" Fora population of size N, the corresponding relations are 


45 = uy + ad. 
45 = иу +344 иу s 
Mi, = ш +4ц| иу + GH? и; + д\*. 

4,72 Sheppard's Corrections. In the calculation of moments from a grouped fret 
distribution, certain errors are introduced by the assumption that the frequencies associated with | 
are located at the midpoint of the class interval. These errors therefore need corrections. It h 
shown' by W.F. Sheppard that, if the frequency distribution (1) is@Mntinuous and (ii) tails off to 
each end, the corrected moments are as given below; eS 

2 . 
ту (corrected) = т (uncorrected) ~ Же Ку 
12 © 


m3 (corrected) = т; (uncorrected); V 


where A denoted the uniform class-i 
frequency distribution. The importa: int to note here is that these corrections are not applic 
highly skewed distributions and di ions having unequal class-intervals, 


4.7.3 Moment-Ratios¢ NY are certain ratios іп which both the numerators and the denom 
are moments. The most фо of these moment-ratios аге £; and / defined by the те 
2 б 
В =È., gy c 14. Rey ме independent of өкіріп and vit of measurement, j.e. they 
u “2 
members. Actually, | is the square of the third population moment expressed in standard units 
is the fourth standardized moment for a population, where a standardized variable has been define 
Z z(x-u)/a. 


For symmetrical distributions, / is equal to zero. It is, therefore, used as a measure of ske 


used to explain the shape of the curve and is a measure of peakedness. For the normal distribu 
discussed later, 2; = 3. 


The moment-ratios (or the standatdized moments) for sámple data are similarly defined as 


_ (my? and = ms 
я (т) " (тз)? 
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xample 4.13 Calculate the first four moments about the mean for the following set of 
ion marks: 45, 32, 37, 46, 39, 36, 41, 48 & 36. 


For convenience, the, observed values are written in an increasing sequence. The necessary 
Sons appear in the table below: 


TR T 9 
2 
e N y 
DE N 1 
my = Ey =F as DA as Si 67 (marks)? 


Man -»* ME 1979 21189.78 (marks) 


е; Er 


Example 4.14 Compute the first four moments for the following distribution of wages after 
Sheppard's corrections. 


*ekly Earnings (Rupees) 


- of men 
(P.U., B.A./B.Sc. (Part-I), 1962) 
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We first calculate moments about.an arbitrary origin. The necessary calculations — shown 
The moments ME x = 10 are obtained by dividing the column sums by л. 


Earnings in | Men f; D, Лр; лр Ж? in 
Rs. LT (x; -10) 
-5 


+8 7:74 
Sums+n 28.38 
2% =m 


Moments about the mean are: Q) 
: © 
m -0 $ 
2 0612 : хе 
m; = ту = (mi)? = 2.64— (0.06)? = 2,649 
ту = m4 -3m5mi  2(m[? % 


-0.56-3Q. 65009 +20407 = 0.08; 
m, =т;-4т%т| + өч 3(т{)* 
= 28.38 4(0. 56) (0899 6(2. 64)(0.06)? —3(0.06)* = 28.30 
Applying Sheppard's Ed , We have 
mz (corrected) = m» (uncorrected) - War 64 — 0.08 = 2.56, 
ту (corrected) = m3 (uncorrected) = 0.08, 
ma (corrected) = m, (uncorrected) ~ = „тә (uncorrected) + тіз 
= 28.30 — 1.32 + 0.03 = 22101 


4.7.4 Change of Origin and Scale. Let a and h denote the arbitrary origin and the с 
Then we define a new variable u as 


х-а@ 
uy са 


so that x; -a = йи; X-a = йй and hence x, —X = h(u, - ir). 
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Substituting these values imffie rth sample moments, we get 
"d I 
т, = ZA -а)' =h LX fius 
1 т 1 т 
ші m, =D fis, -)' «Уш 0). 


This shows that the rth moments of the variable X аге A" times the corresponding moments of the 
и, and are independent of the origin ‘a’. In other words, the moments аге not affected by a 
of origin but are affected by a change of scale. 


4.7.5 Charlier Check. We have seen that the computation of the moments depends upon the sum 
products of the frequencies by the corresponding values of the variable. It is, therefore, desirable to 
these computations so that arithmetic mistakes, if any, are avoided. For this purpose, L.V. Charlier, 
egian Statistician, introduced a check known as Charlier check. This check actually consists in 
the assumed origin in the coded form by one interval. The relations used for this purpose are 
below: 

Xf(u-)2 fun e 

Xf? =¥ fu? «2y fu+n KS 
Y fiui -Y fie «3y fu? «3Y fun 
X f(u*W! =E fu! «AY +6 fi? «AY fu S 
Example 4.15 Calculate the first four moments КУРУ mean from the data of Example 4.4. 


The necessary calculations by taking u; = 5% до, are set out in the following table. The last 
© 


is used for Charlier's check and the colqfun sums аге divided by n to get т,. 
Computations 


SS 
oe PAE. a Ee 
ы 


ые — 32€] 192 [ 396 [1%] 4740 


Sums-n For 
aes Ы 


1 check, 
fu! = X fu* «4Y fu! +67, fu? «AY fun 
= 1848 + 4(396) + 6(192) + 4(24) + 60 
= 1848 + 1584 + 1152 + 96 + 60 = 4740, which is the sum in the last column. 
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Hence the moments about the mean and in.class-interval units are obtained as below: 


m =0 


m; =m; —(т{)? 
=3.2- (04) = 3.04 

тз «nj —3тут{| + 2(m[? 
= 6.6 — 3(3.2) (0.4) + 2(0.4} = 2.89 

та = т -4тут\ + 6т5(т!)? -3(т{)* 
= 30.8 — 4(6.6) (0.4) + 6(3.2) (0.4)! — 3(0.4)* = 23.24 

To get the moments about the mean in ordinary units, we multiply m; by h^, i.e 400, 
(20) and m4 by (20)*. Thus m, = 1216, m, —23120and m, = 3718400. 


Example 4.16 The first three moments of a distribution value 2 of the variable 
and —40. Show that the mean is 3, the variance 15 and m; = ES so show that the first three 
about x = 0 are 3, 24 and 76. 


Here we are given ni == fe 2)= у 7.00) 


me 


gs =16 19222 


Ke «- -2y --40 2449) 
Re. —&, so that 
Ў -mi +а=1+2=3 (a= 2) 
Апа уагіапсе, 52 = m; (second moment about mean) 


= т -(т{)? =1621=15, and 
ms = m5 Этот + 2(mj 
* —40 — 3 (16) (1) + 2(1)3 =—86. 


To find the moment about x * 0, we need the values of — тул, 25 fi? and 1 SEA, 
obtained from relations (1), (2) and (3). 


From (1), ER. 


оғәкенӘ РЗ 84819903 Шодвроісот 


From (2), iefG-a =16 


ог ІУ/02-4х4%-16 
п 
ог ly 242 4-16 
п. п 
or ly =16-4+4(3)=24 
п 
(3) оп expansion сап be written as 
1 әсі 2 № 
T. b= 12448 = -40 
SEA Y fd +1244 
or Ly fi? --40«8-12G) « 604) - 76 


Hence the moments about x = 0 are о 
о 


mi = 25 s, © 
п Me 
3 9 
my = 2 224, and et 
n NS 
3 
ну os Ў 
Example 4.17 Show that for discrete tions, 2: »1 
К. 
топ, 8, = 8. N (P.U., D. St. 1962) 
H5 е 


Sow f, will be — one if the numerator is greater than the denominator, і.е. if 
2 orif u4- 43 >0. E 


1 
а-а =ү®Легш* -o* 


Cs “2 2g?) 
лад) eo -20! 


-Ly jo - a 7 SE. 201 2/0 Да Е 
Lt E 


Лод) +04 -20^(- y] ot =“ у, /) 
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= Уло) 02] which is essentially positive. 


Hence f; 21 because py — 43 18 always positive. 


4.8 SKEWNESS 


А distribution in which the values equidistant from the mean have equal frequencies is defined 
be symmetrical and any departure from symmetry is called skewness. It is important to note that 
perfectly symmetrical distribution, the mean, median and mode coincide and that the two tails of 
distribution are equal in length from the mean. These values are pulled apart when the distribution 
from symmetry and consequently one tail becomes longer than the other. If the right tail is longer than 
left tail, the distribution is said to have positive skewness. If the left tai! distribution is longer than its 
tail, it is said to be negatively skewed or to have negative skewness. In a positively skewed dis! 
the mean is greater than the median and the median is greater than the mode, і.е. mean > median > 
and in a negatively skewed distribution, mode > median > mean. 


Y ‘ 


Negabve кеў 
The difference between the soi of обаа. being an indication of the amount of ske 
asymmetry, is used as a measure 6% wness. A measure of skewness is defined in such а way that 
measure should be zero when ee is symmetric and (ii) the measure should be a pure 
i.e. independent of origin of measurements. 


According, to the degree of skewness of a distribution or-curve, Karl Pearson (185 
introduced a coefficient of skewness denoted by Sk and defined by 


We know that mode is sometimes ill-defined and is difficult to locate by simple me 
therefore, replaced by its equivalent from empirical relation holding good in moderately 
distributions. The Pearsonian co-efficient of skewness then becomes 

eeu ee ees 


k= 3(Mean - Median) 
Standard deviation 
The coefficient usually varies between -3(negative skewness) and +3 (positive skewness) 


sign indicates the direction of skewness. The formula satisfies both the requirements considered 
for a measure of skewness. 
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Arthur Lyon Bowley (1869-1957), a British statistician, has also proposed a measure of skewness 
is based on the median and the two quartiles. In а symmetrical distribution, the two quartiles are 
ant from the median but in an asymmetrical distribution, this will not be the case. The Bowley 's 
icient of skewness is 


Sk= Qi +0; — 2Median 
9-0 
Its values lies between 0 and +1. 


Another measure of skewness that is often used, is the third moment express in standard units (or 
moment ratio) and thus is given by 


Sk= A for population data, 


A for sample data, 


coefficient for most distributions, will be between -2 and +2, gne statisticians denote it by аз or 


if the coefficient is greater than zero, the distribution or lis positively skewed. If Sk « 0, there 
skewness, For symmetrical distributions or curv: coefficient is zero. 


М 
KURTOSIS - 


Karl Pearson (1857-1936) introduce the 
of peakedness or flatness of a unimoi 
round the mode in such a way 
we say that the curve is р/а 


peaked nor very flat-topped, i 


1 Kurtosis (literally the amount of hump) for the 
uency curve, When the values of a variable are closely 
curve is /eptokurtic. If, on the other hand, the curve is 
c. Since the normal curve (to be described later) is neither 
en as basis for comparison. The normal curve itself is called 


Kurtosis is usually җы by the fourth standardized moment or the moment-ratio 
д, / ul) whose val@@¥or a normal distribution is equal to 3. When £; is greater than 3, the curve 


sharply peaked and has wider tails than the normal curve and is said to be leptokurtic. When it is 
3, the curve has a flatter top and relatively narrower tails than the normal curve and is said to be 


—* Lepto-kurtic 


yds. . 
—+ Meso-kurtic 

қ “ә 

іс > 
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The corresponding measure of kurtosis for the sample daia is bn] = z) It should be noted that 
m3 


the value of 5; for a large ENT from the normal population is very nearly 3. 
Another measure of Kurtosis not widely used, is given by 
Q.D. 
“Ro-Ro. 
where Q.D. is the semi-interquartile range and P's are the percentiles. This is known as the Percen 


co-efficient of kurtosis. It has been shown that K for a normal distribution is 0.263 and that it lies be 
0 and 0.50. 


4.10 DESCRIBING A FREQUENCY DISTRIBUTION 


To describe the major characteristics of a frequency distribution, we need the calculations of 
following five quantities: 


i The number of observations thàt describes the size of the dat 
ii) А measure of centraltendency such as the mean or neige provides information about 
centre or average value. 


iii) A measure of dispersion such as standard deviati t indicates the variability of the data. 
iv) A measure of skewness that shows the lack тегу in the frequency distribution. 
v) А measure of kurtosis that gives informaN@y’about its peakedness. 


It is interesting to note that all these ies can be derived from the first four moments. 
example, the first moment about x = 0 is ithmetic mean, the second moment about the mean i 
variance and its positive square root is ndard devjation. The third mean moment is a 
skewness while the fourth central mo; used to measure kurtosis. Thus the first four moments 
key role in describing frequency distal 


< EXERCISES 
OBJECTIVE 


a) Answer ‘True’ and ‘False’. If the statement is not true than replace the underlined words 
words that make the statement true: ^ 


1) А measure of central tendency describes how widely the data are dispersed about а 
value. 


ii) The standard deviation for a set of values 5, 5, 5, 5, 5 and 5 is 5. 
iii) Тһе unit of measure for the standard score is always in standard deviation. 
iv) For à bell shaped distribution, the range will be approximately equal to six standard devi 
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v) Тһе difference between the largest and the smallest observations is called the Quartile 
беба 


vi) Тһе square of variance gives the standard deviation. 
vi) The coefficient of variation is an absolute measure of dispersion. 
wai) Тһе coefficient of variation is measured іп different units as the data 
x) The Quartile Deviation is based on only two values in the series. 
x) Тһе square root of the variance of a distribution is the absolute deviation. 


MULTIPLE CHOICE QUESTIONS 


i) The main disadvantage of the range is that 
‘ 
a) It does not use all the observations in its calculation. SS 


b) -It can be influenced by an extreme value. ы 
с) Both a and b are correct. S 
d) None ofthe above. qu 


i) Which one of the following is not a me of dispersion? 


a) Range. S" 


b) Standard deviation. K 
Е қ 
с) Second quartile. e 
d) Coefficient of EO 
ш) Which of the Na is not a measure of dispersion? 
а) Interquartile range. 
b) Difference between the values of the largest and smallest items. 


c) Mean of the values of the largest and smallest items, 
d) Standard deviation. 


m) Тһе standard deviation is 
a) The square of the variance. 
b) Two times the standard deviation. 
c) Half the variance, 
d) The square root of the variance. 


vi) 


vii) 


viii) 


ix) 


x) 
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The coefficient of variation is measured in 


а) same units as the mean and the standard deviation. 
b) Percent. 
с) Squared units. 


d) None ofthe above. 

If the original units are measured in pounds, the variance is 
а) Also measured in pounds. 

b) Measured in pounds squared. 

с) Measured in half pounds. = 

d) None of the above. 


If the tail of a frequency distribution is in positive direction (to the right), the coe 
skewness is 


a) Zero. eS 
b) Positive. ом 
с) Negative. M 

d) None ofthe above. SS 

: by 


The standard deviation of a бед distribution is 10, the mean is 250, the median is 
V REESE S cient of skewness is 


а) Zero. i> 
b) Positive. 


€) Negative. SU 
d) None of з, 
Which of the following statement is true? 


а) The standard deviation is less them the range. 
b) The range is less than the interquartile range. 


‚ €) Thearithmetic mean always exceeds the median. 


d) The arithmetic man always exceeds the mode, 

Which of the following is not a property of the standard deviation? 
a) Itis always negative number. 

b) Itis affected by extreme values in a data set. 

c) Itis based on all the values in the data set. 

d) Itisthe most widely used measure of dispersion. 
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If a distribution has zero standard deviation, then which of the following is true? 
a) All observations are negative. 

b) All observations are positive. 

с) All observations are equal. 


d) Number of positive values and negative values are equal. 


The empirical rule generally can be applied to 
a) Bell shaped distribution. 

b) Any distribution, 

с) Only continuous distribution. 

d) Any skewed distribution. 

Symmetrical distribution will always have skewness equal to 

a) Negative. S 
b) Positive. ev 

c) Zero. 
d) Approxjmately zero. NS 


- For a normal distribution the measure of, sis equals to 


а) Zero. әр 
$ 


b) 3. K 
c) Positive number. МӘ 


d) Negative number. S 

For the given запра set 2, 8, 10, 15, 20, 9, 18, 0, 7, 10, which is the value of coefficient 
of variation? 

a) 70.00 percent. 

b) 15.50 percent. 

с) 145.00 percent. 

d) 61.21 percent, 


VE 


Explain clearly the meaning of the term Dispersion. What are the most usual methods of 
measuring dispersion? Indicate the advantages and disadvantages of these methods. 
(P.U., B.Com. 1960; B.A. (Hons.), 1960; B.A. (Part-I), 1961) 


Discuss the different measures of dispersion. Describe the method of computation of any two 
of them with suitable examples. (P.U., M.A., Econ. 1969) 
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4.3 Describe carefully how Mean Deviation, Standard Deviation and Quartile Deviation of 
given distribution are obtained, In what problems, should each be used? 
(P.U., B.A. (Part-I), 1962 


4.4 a) What is Range and how is it calculated? What are its uses? 


b) Define Quartile Deviation. Find the quartile deviation from the following 
(1) graphically, (ii) using an appropriate formula. 


income per week (Ra) | 41-50 | 51-60 [ 61-70 | 71-80 Гао | 91-100 | romi 
оаа — [3 [35 | @ | | n | м | 300] 


(P.U., B.AJB.Sc. 1 


24,5 Тһе members of а sports club, 60 male adults, had their weights recorded, in pounds. 
weights are given below: 


171 160 144 132 154 160 160 158 148 160 ІЗІ 153 
131 165 139 163 ,149 149 140 149 150 161 136 144 
165 174 153 149 157 169 147 156 1 171 149 154 
153 149 147 154 145 158 160 15g 5456 138 167 142 
165 155 140 155 158 147 149 9 148 174 150 144 


Construct a cumulative frequency table Қ calle weights, using classes oftwidth 5 Ib, 
at 129.5 Ib. Hence draw a cumulative y graph, and use this to firid the i 
semi-interquartile range. Ne) ° 

Use the grouped frequency ta calculate the mean and standard deviation, and 


them with the values obtained Ж Һа the original ungrouped data, 
к> (М.А. Econ. LU. 1990; B.Z.U, 


4.6 a) Define Mean Devig bd and its co-efficient. Discuss its advantages and uses, 


e 
40) Estimate the deviation from the arithmetic mean of the following set of e: 


marks, 
[N.fmém|2] 3] s |a| m юа |з | 


47 When originally collected, the data in the following distribution had been in 5— 
groups throughout the whole range 20-64. Using central values of these age-groups, 
age had been calculated to be 44,5 years and the mean deviation without co: 
grouping to be 7.15 years. Reconstruct the original table with 5-yearly age groups 
information given. 


EIER 
[Меге _ | 1 | 2 | 26 | 2 | 2 | 15 | w | 


(P.C.S., 1971, 93; P.U., В.А./В. 
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п: Let Л апа чү, be the ноя corresponding to the groups 30-34 and 55-59 іп 5-yearly 


350 
12.5 f, 
195.0-7.5 f, 
55.0 
50.0 
112.5 
12.5% 
245 ou % 


% 
Now, Mean = iy №, 
п 


ік 445 ‚4550-9 Л + S 
c 74-2 E 
ACT 
and Mean Deviation — zx Лх-Я . 
Кс 
C 18--1-1715%8 - Аф 


or A- жы s 
imer fi 210 and f; =10. 


Find the quartile ae and mean deviation as well as their co-efficients from the following 
Бы and comment. S9 


ver 


| No. of Persons [бюрА | 101 [30 | 47 
| Group | 15] 20] 32] 351 35 22] 20] 16] 


x Define Variance and Standard Deviation. Describe their properties. 


For a population of numbers 10, 8, 7, 9, 5, 12, 8, 6, 8, 2, calculate c? and с. 
(P.U., М.А. Econ. 1992) 


Prove that the variance remains unchanged when a consiant is added to or subtracted from 
every value of the variable. 


The scores obtained by five students-on a set of examination papers аге 70, 50, 60, 72, 50. 

These scores are changed by (i) adding 10 points to all scores, (її) increasing ail scores bv 

10%. What effect will these changes һауе on the mean and on the standard deviation? 
(P.U., B. AJB.Sc., 1971) 
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4.11 a) Define the mean and standard deviation of a distribution. x is the mean and S, S 
standard deviation. When a provisional mean “а” is chosen, the corresponding provisio 
standard deviation is found to be S}. Prove that 52 =S? «(x -a)?. Explain briefly 
advantages of this procedure in пшпегіса! work. 

(P.U., В.А./В.ӛс. 19 

.b) For a set of ungrouped values, the following sums are found: m — 

Уух = 480, Хх? =15735. Find the mean and the standard deviation. 

(B.Z.U., В.А./В.ӛс. 19 

с) Тһе mean and standard deviation of a sample of 20 observations were found to be 75 

2.5 respectively. On checking the original figures, it was discovered that one obs 

which was actually 68, was copied down as 86. Find the correct mean and 

deviation. 


4.12 a) Describe the properties of the standard deviation. 


»b) By multiplying each of the numbers 3, 6, 2, 1, 7, 5 қала then adding 5, we obtar 
17, 9, 7, 19, 15. What is the relation between So deviations and the mea 


the two sets? 
(P.U. B.A./B.Sc. 19 
c) А child is born to Mrs. Х every SS, камаша years. Compute the sta 
deviation of the children's age whens youngest is 1 year old, and (ii) the young 


8 years old. Why do the answei E (ii) coincide? (P.U., B.AJB.Sc. И 
4.13 Show how to compute the rey the Standard deviation of a sample of n ob 
and explain briefly the ~~ these statistics, giving examples of situations in which! 
would be used. 
Show that when n = T e estimates are simply related. 


+4.14 И is often stated in frequency distributions there exists the approximate 


Мәйереті = 0.8. Test this statement іп the following distribution: 


metal bars. 
s 30» 31, 332. 33 36 38. 26 37 38 79 


ONE A MUS бәз COELI: A8; 18. 4, иЗ od 
(P.U., D.St. 1963; W.P.C. 


4.16 Explain how Chebyshev's rule can be used to answer questions about a data set? U. 
rule, find and interpret the interval X + 25 for question 4.15. 
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7, The following table gives the frequency distribution of expenditure on food per family per 
month among working class families in two localities. Find the arithmetic mean and the 
standard deviation of the expenditure at both places. 
Range of expenditure 
inrupeespermonth | —PlaceA | PlaceB | 
` 30-60 
60-90 
90-120 
120-150 
150-180 
180-210 
210-240 


(P.U., B.A., (Part-I), 1961; М.А. Econ. 1970) 


What do your uhderstand by Variance? The wages of 1,000 e range from Rs.4.50 to 
Rs.19.50. They are grouped іп 15 classes with a common с terval of Rs.1, and the class 
Sequencies from the lowest class to the highest class are: , 35, 48, 65, 90, 131, 173, 155, 
117, 75, 52, 21, 9 and 6. Find the mean wage and its sagit ition, 

(P.U., D.St., 1965, P.C.S. 1986) 


The braking strength of 20 test pieces of a сепа Су is given as under: 
95 103 97 130 96 78 95 89 68 
82 79 69 67 830) 8 94: 87.93 117 


Walculate the average breaking of the alloy and the standard deviation. Calculate the 
percentage of observations reus n the limits: mean +S; mean +25; mean +35; where 
sands for the standard drin (P.U., B.Com., 1961, I.U., M.A. Econ. 1989) 


Vering the production of a new style of collar to attract young 
of neck circumferences are available based upon measurements 


Bahe aches) [125 [130| 135 [reo [s [55 ss [res ues] 
Б ее ааа 


тчие the standard deviation and изе the criterion x +3 (standard deviation) to determine 
“шт тезі and smallest size of collars he should make in order to meet the needs of practically 
customers, bearing in mind that collars are worn, on average, % inches larger than neck 
(P.U., В.А /В.5с. 1960) 


values of the arithmetic mean and standard deviation of the following frequency 
"hution arces variable derived from the use of working origin and scale are 
ively. Determine the actual class-intervals. 


(P.U., D.St5 1964; В.А./В.5с. 1973) 
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» 422 In the manufacture of a certain scientific instrument great importance is attached 
a particular critical component. This component is obtained in bulk from two s 
B, and in the course of inspection, the lives of 1,000 of the components from eac 
determined. The following frequency tables are obtained: 


SA XU pe cepa Un] [ie TD 
1010 1,030 — 1,040 
4930 1,040 - 1,050 
1959 1,050 — 1,060 
1979 1,060 - 1,070 
10401 L 1,070 — 1,080 
11o | 1100- 1,120 1,080 — 1,090 


Examine the effectiveness of the measures of dispersion with which you are & 
comparing the dispersions of the two distributions. 


423 Show that, for any discrete distribution, the mean deviation abont the mean is not 
the standard deviation. ' d (P.U., В.А 
i e 


Solution. Ву definition, 
Mean Deviation = Е/М = | and M 
S 


з= —Xf(x-x, whe: Ñ; 
r S 


We are required to show that New 
1 2 S 
irj. (1 Ef ge ӨТЕР 


ie — n[Lfdi]»(Xfd NS 
іе. (Лал tet Se (OT fidi +t fid)» (fid + fads ++ fidi)! 
ie EfMPS EQ td > ESPA? +2E ff dy) 
ie. ERG Y >0 
which is true because (d; —d, ) is always positive. 


4.24 a) What is the co-efficient of variation? What purpose does it serve? 

b) The following data have been obtained from a frequency distribution of a 
making the substitution X = 62+ 5и: Xf =120, X fu = 140, У fu? —598, = 
co-efficient of variation, using corrected standard deviation: 

(P.U., B 


4.25 а) What do you mean by absolute and relative measures of dispersion? State the 
co-efficient of variation in statistical analysis. (P.U., В.А. 


RES ОЕ орем 4994801005роі.сот 


-b) Given below is the distribution of weekly income (to the nearest гирее) of 100 households 
in a locality A. Calculate the standard deviation. 


ГЕЧЕ 
[s о ЖШ е а Га о EA 


au -- n2 "v 22. БЕ] 27 ГЕ; 
If 123 households in a different locality B had a mean weekly income of Rs.52.28 and a 
standard deviation of Rs.4.96, then compare the variability of the weekly income of two 
localities. (B.Z.U., M.A. Econ. 1986; P.U., В.А./В.5с. 1974) 


a) Definé Range, Mean Deviation and Standard Deviation, and compare their merits as 
descriptive measures of dispersion. 


b) Two candidates X and Y at the В.А. (Hons.) Examination obtained the following marks in 
ten papers. Which of the candidate showed a more consistent A 


| Paper | 1 n m vw v УІ Уп УШ ix 


+ [жэ жюн п азе ч 


ШИЕ: 
um Е on. — P.U.. B.AJB.Sc. 1974) 


U © 
3) Calculate the corrected co-efficient of = Д when mean = 67.45, variance 
(uncorrected) = 8.5275 and the class- ME (P.U., B.AJB.Sc. 1968) 


b) Ure wa bå AE: 


Who is better as a run getter? Whd is the more consistent player? 
e (P.U., В.А /В.5с. 1970; B.Z.U., М.А. Econ. 1984) 


а. Explain the аи between absolute dispersion and relative dispersion. Describe the 
properties of the standard deviation. 


В) Find the co-efficient of variation from the following data using both uncorrected and 
corrected standard deviations. 
E 118-126 127-135 136-144 145-153 154-162 163-171 172-180 


A manufacturer of television tubes has two types of tubes A and B. The tubes have 
respective mean life-times хд =1495 hours and xg =1895 hours: and standard 
deviations Sa = 280 hoursand Sg = 310 hours. Which tube has the greater (i) absolute 
dispersion, (ii) relative dispersion? (P.U., B.A/B.Sc. 1969, 71) 


126 
4.29 


430 


431 
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Compare the variability of ышанын of families in two towns as given below: 


(B.Z.U., M.A, Econ. 


Compare the variations in the following frequency distributions of weights of 
computing the co-efficient of variation in each case: Also draw a box plot. 


20<x<30 7 5 
10 д, e » 
20 и 
18 
7 S 


30<х<40 

40<х<50 
If in a series of measurements, we «P , values of magnitude x,,7, of magni! 
and so on, and if X is the ue of all the measurements, prove that the 


50<х<60 
deviation is 


60<х<70 


82% where х =k +ô. 


(B.Z.U., В.А./В. 


а) ге а um 3 boys, the mean score and standard deviation of scores on a test 
r'a group of 40 girls, the mean and standard deviation are 54.0 and 
o's sam® test. Find the mean and standard deviation for the combined 
children. . 
b) A distribution consists of three components with frequencies 200, 250 and 
means of 25, 10 and 15, and standard deviations of 3, 4 and 5 respectively. 5 


mean of the combined distribution is 16 and ite: 8.6.'15'7 2 apoE. Fi 
coefficient of variation. 


a) What is meant by a standardized variable? 


b) Show that for any distribution expressed in standard measures, the mean is 
standard deviation is опе, 
(P.U., B.A/B 


с) 


Тһе mean of scores for a group of students on a certain test was 63.7 with a standard 
deviation of 12.3. Find the Z score for the top student, with a score of 98 and the bottom 
student, with a score of 21. 


Three tests had the values in the following table for mean and standard deviations. 


Student A received grades of 70, 90, 70 on the three tests, while student B received grades of 
90, 70, 70. Assuming that all three tests should carry equal weight, change the grades to 2 
Scores and average each student's scores. (P.U., B.Sc. (Hons.) Рагі-1, 1971) 


a) 


b) 


a) 


b) 


a) 


b) 
a) 


b) 


с) 


а) 


Explain what is meant by the trimmed and Winsorized measures of central tendency and 
Calculate the trimmed and the Winsorized means fou standard deviations for the 
following set of 15 scores: © 


Ke 
80, 75, 42, 63, 65, 43, 78, 96, 82, 58, 79, 72, a 68. 


Define Moments about an arbitrary origi about the mean. Express the moments 
about the mean in terms of the momen ut any point and conversely. 


Give Sheppard's corrections to mats and explain where they are used 
(P.U., B.A. (Part-I), 1961, 1962-S) 


Prove that the second mo; > stint an arbitrary origin equals the second moment about 
the mean increased by @@ square of the distance between the arbitrary origin and the 
mean. СА (B.Z.U., В.А /В.5с. 1976) 


Derive the relati ich give the third and fourth moments about the mean in terms of 
moments а! origin. 


What aspects or characteristics of a frequency distribution are measured by the moments? 
(P.U. B.A./B.Sc. 2008) 


Prove the general formula connecting the moments about the mean with the moments 
about the origin 
T D Ай =. 

(Р.О. B.AJB.Sc. 1985, 89, 08) 
Explain how changes of origin and of scale affect the following: 
Mean, standard deviation, moments and #,. (Р.О. В.А /B.Sc. 1985, 89) 


The first four moments of a distribution about x = 2 are 1, 2.5, 5.5 and 16. Calculate the 
four moments about the mean and about zero. (P.U., B.A./B.Sc. 1973, 76, 08) * 


Hr = Mp -THp up + 


128 


de 
e 


rn 


444 


445 


4.46 
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b) The following data have been obtained from a frequency distribution of a variable X 
making the substitution x = 10 + 5и. 


X f 2125, X. fu = -46, X fu? = 806, Y fi? --242 and У fu =1962. 


Calculate the Mean, Variance, b, and b, (moment-ratios). Would you consider 
distribution normal? (P.U., В.А./В.ӛс. 1975, 


Calculate the first four moments about the mean for the following data. 
1 2 3 4 5 6 7 8 9 


Calculate the first four moments and apply the test of normality to the following data 


pertain to weekly earnings in rupees of 200 labourers. 
Weekly wages 15 16 17 18 19 20 21 22 23.24 
No. of Labourers | 6 19 13 18 20 25 28 34 22 15 


Compute the first four moments about the mean from the following frequency di: 
Also calculate b, and 5b. . Y 

Groups [2-4 [4-6 Ге [8-19 | 10-12 | 19-14 |G | 16-18 [18-20] 
[Frequency | 18 | 24 | 47 | 80 | 102 | 6? | 21 | 15 | 


(P.U., D.St T 
a) Define the moment-ratios 8, and f. rmation do they give? 


b) Eight coins were tossed together 256 че the frequency of the occurrence of 0, 
..., 8 heads in a throw xe 


Calculate the values of en atios band by. 1 
RUSS the values М and b, from the following data. Use Charlier check 


52 8 12% \ 14-16: 16-18 18-20 2022 22-24 24-26 26-28 
Lf | 


110 218 275 222 108 32 


а) What is meant by skewness? How would you find it in a non-symmetrical distri 
Distinguish between positive and negative skewness. 

b) Define the moment-ratios Й, and Й,, and state the purpose for which they are 

(P-U., В.А. (Part-I), 1 

a) State the measures commonly employed to define Skewness and Kurtosis. What 
of the frequency curve are measured by them? 

b) What can you say of the skewness in each of the following cases? 
i) The median is 49.21 while the two quartiles are 37.15 and 61.27. 
ii) Mean > 1403 and Mode = 1487. 


iii) The first three moments about 16 are respectively —0.35, 2.09 and 1.93" 
(P.U., В.А./В.ӛс. 1974, 78; М.А. Econ. 1 


https://stat9943.blogspot.com 


ASURES OF DISPERSION, MOMENTS AND SKEWNESS 
/ 
243 Calculate skewness by (i) the Pearsonian method, and (ii) by the Bowley's formula from the 
following frequency distribution and interpret the result. 


15-19 | 20-24 | 25-29 35-39 


[ No. ofMen | 29 | 176 | 208 | 173 | 82 | 40 | 15 | 3 | 
(P.U., М.А. Econ. 1986) 


Calculate the first four moments about the mean and provide one estimate each of skewness 
and kurtosis of the following distribution. 
ВР = 25: 30 35 40 45: 50 55 460 
Frequency: 2 E 118.227. 23 Иб 7 2 
(P.U., М.А. Econ. 1967, I.U., 1992) 


The following values have been obtained from two different frequency distributions of 
weights (Ib) having 125 and 200 observations respectively after making the substitution: 

X =16+ 5u, Y - 20 * 2v, 

a) Xfu--46,X fu? =306, X fi? = 2242, X. fu* - 1962 

b) Xjfv-21 Xf? - 1265, X fi? =-627, E fv^ - 14169 MSS 


Find i) which of the distributions is more consistent. 
ii) which of the distributions is negatively ОТЫ 


іш) which of the distributions is Meso- (P.U., В.А./В.5с. 1993) 


а) The fourth mean moment of a symme Қ, рм. is 243. What would be the value 
of the standard deviation in order iari istribution may be mesokurtic? 
(P.U., В.А./В.ӛс. 1981) 


b) In а certain distribution, 9% first four moments about the point 4 аге 
-1.5, 17, —30 and 108. Ca e b, and 5b, and state whether the distribution is lepto- 


kurtic or platykurtic. ХҚ (P.U., D.St., 1962) 
3) The second SE vei the mean of two distributions are 9 and 16, while the fourth 


moments abou ean are 230 and 780 respectively. Which of the distribution is 
ü Epikard DRY enesokkurtic (iii) platykurtic? 


b) The second moment about the mean of a symmetrical distribution is 25. What must be the 
value of the fourth moment about the mean in order that the distribution be (1) leptokurtic, 


(ii) mesokurtic, (iii) platykurtic? (P.U., B.A./B.Se, 1971) 
Discuss the various measures or quantities by which the characteristics of frequency 
distributions are measured and compared. (P.C.S. 1993) 


Explain briefly how averages, measures of dispersion, skewness and kurtosis are 
complementary to one another in understanding a frequency distributions. 


The mean and standard deviation of a variable X are 60 and 8.944 respectively. Find the mean 
and standard deviation of a new variable if 

i) All the values of X are increased by 20 points. 

и) All the values of X are increased by 25%. (P.U., B.A./B.Sc. 2008-5) 
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4.55 Тһе mean, mode and standard deviation of the weekly earnings of a random sample of 
workers from a locality are 3133.33, 2804.35 and 796.70 respectively. 
i) Calculate Skewness of the distribution and interpret the result. Also find coefficient 
variation. 
i) What will happen to the values of the mean and standard deviation if every woman 
increase of Rs.500 per week? 
iii) What will the mean and standard deviation be if every woman has an increase of 1 
previous earnings? 
(P.U., В.А./В.ӛс. 


**9999999* 
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51 INTRODUCTION 


An index number is a statistical measure of average change in a variable or a group of variables 
жүз respect to time or space. The variable may be the enrolment of students in an institution, the cost of 
education for college students, prices of a particular commodity or a group of commodities, wages of 
wexkers, volume of trade, sales, experts and imports, production, unemployment, group health, 
Government securities, etc. Index numbers are obtained by expressing the data for various periods or 
pisces as percentage of some specific period or place selected for the purposes of comparison and 
æcħnically called the base. Index numbers may be computed оп weekly or monthly basis but generally 
ез are computed on annual basis. 


The classical definition of an index number is provided by the English economist ЕЛҮ. Edgeworth 
1545-1926) who states that “ал index number is a quantity which shows by its variations, the changes 
== time or space of a magnitude which is not susceptible either of accurate measurement in itself or of 
ect valuation in practice.” 


Іп short, an index number is a device that measures the changes occurring in data rom time to time 
= Som place to place. 


5.1.1 Simple and Comosite index Numbers. Index Se generally classified into Simple 

Toces and Composite Indices. Ап index nuraber is called a sim dex when it is computed for a single 

тәме. Index numbets of enrolment in colleges, index n of gold prices, etc. are examples of 

indices. A simple index can be very easily compu: value of the variable for each period is 

Vise eee wey ad one lied by 100. For example, the wages paid to 

Ge workers in а certain institution in 1980 and Бете Rs.9,650 and Rs.1!,580 respectively..Now 
hyi we 


1980 as the base year aud 1983 as the give! have 


Wage Index Zor 1983 = 


11,580 
суырту 
Ls .9,650 


“Тез resuit indicates that if the wage level in 1980 were deiloted by 100, it is 120 m 1933. In other words, 
have increased by 20% fo; 1983 by comparison with 1050. 


i An index thai is conmuted from two or more variables ig referred to as а composite index. 
les of composite indices ùre- the wholesale price index. consumei price index numbers, 
Composite indices are 12029 impciżant us inany of the index numbers in common use are composite 
теше. 


Composite indices may furiher be classified into U;nvetghted and Weighted index numbers. Before 
discuss these index numbers and the methods of their computation, let us first consider the problems, 
has to face, in the compilation of these index numbers, 


5.1.2 Problems Involved in Index Number Construction. In the corapilation of index numbers, 
problems are involved. The first problem is to understand the purpose which an index 15 to serve, 
purpose of the index may be to compare the scores of two students, or to measure the changes in the 
price level or to measure the changes in the production of scooters or to compare the changes in 
of factory workers over different places, etc. The next problem is to decide what should be 
The data to be inciuded should relate to purpose for which the index is to be used. This step also 
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involves the collection of data on scores, production, process, wages or whatever is being compared 
Another problem is to decide what period should be chosen as the base period, ie. the period with which 
the other periods are to be compared. Їп case of composite index numbers, another problem is to decid 
what method of averaging should be used 10 arrive at а single index for each period. The method of 
averaging usually includes the system of weighting but sometimes one faces the problem of assigning 
some explicit weights to the various items of the data so that their relative importance is taken йш 
account. 

These problems are discussed in somewhat detail in section 5.2 with reference to the con 
of price index numbers as the indices used to measure the change in price level are more important. 


5.2 MAIN STEPS IN THE CONSTRUCTION OF INDEX NUMBERS 

OF WHOLESALE PRICES 
The usual method of compilation of an index number of wholesale prices involves the follows 
steps: : 

% 
i) Selection of commodities to be included, their number and price quotations. 

ii) Selection of the base period and calculation of price relatives 

iii) ^ Selection of average to be used. ы 

iv) Selection of appropriate weights. J М 

5.21 Selection of Commodities for Inclusio; first step is to decide on the number 

commodities to be included. There is no hard and е for this purpose. Though in statistical thee 
there is a well recognised principle that the larg lumber of items included, the greater would be 


accuracy, but we know that a very large nu commodities would involve complications, expe 
and delay in the construction. Hence a reasoQ number of commodities on the basis of their evaluate 


importance should be used. xe 

As pointed out by Dr. Irving (1867-1947), index numbers of prices are seldom of mue 
value unless they consist of more 20 commodities and 50 is a much better number. However, it 3 
important to bear in mind that deciding on their number, the commodities to be selected must b 
(i) representative of the tast bYts and requirements of the people concerned, (ii) unlikely to vary 


quality or grade and (iii) co le. 


Having decided on the number of commodities, arrangements are made to collect the wholesale 
prices of the commodities chosen. The prices should be obtained from the various sources, e.g. fro 
exchanges and big markets where they are quoted, from price bulletins, trade journals and newsp: 
and from leading firms. The price quotations obtained should be representative, reliable and corm 
аз regards the quality of the commodities and the units їп which the commodities are expressed 


5.2.2 Selection of the Base Period. The next step is the selection of a base period-a period fro 
which the changes are measured. The prices of all other periods are then expressed as percentages of t 
base period prices. Two methods of selecting the base period are available. They are the Fixed 
method and the Chain base method. 


Fixed Base Method. A fixed base method is опе in which a particular year is generally chosen а 
the base period that remains unchanged during the life term of the index. It is relevant to note that ё 
base year should not be too far distant in (һе past and should be a "normal" year. By a "normal" year, 
generally mean a year of economic stability and free from any major financial crisis caused by inflate 
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ion, wars, labour unrest, lock-outs, famines, etc. In other words, it is a year during which prices 
have remained more or less stable. If a single year of normal conditions is not available, an average 
prices of several years is used as the base period price, This average minimizes the influence of 
and disrupted economic conditions. 


11 is customary to denote the base period Бу the subscript 0, e.g. ро (or qo) will denote the price (or 
) of a commodity in the base period, while the subscripts, 1, 2, ..., л denote the other time periods 
logical order. The average price of the base period chosen is then set equal to 100, Index 
(or price relatives) for other periods denoted by Po), Роз, ..., Po, are then computed as relative to 
period. Thus the price relative for the given year n, will be 


Price of a commodity in the given year 


Price relative — x100 
Price of the commodity i in the base year 
р 
Py, = = х100 
РФ 
relative expresses the price of a commodity in a given year tion of the price in the base 
Жз multiplied by 100 to make it a percentage but is usually without the percent symbol. 


relative is independent of any unit of measurement. ом 


Chain Base Method. А chaín base method is one in the base period is not fixed but moves 
given year. That is, the relatives are computed wa immediately preceding year as the base 
relatives are called link relatives, Thus а m 


Я PO е 
Соате vom t еп yea zum 
Price of the commodity 9 preceding year 


_ Pn > 
= P x 100 "s 


P, n-ln 


the factor 100) invo! tween the two years. This process of conversion is called the chaining 
and the indices thus determined are the chain indices. In other words, we say that the link 
are "chained" back to a fixed base period by a process of successive multiplication and hence, 
"chain index". For example, if Ра, Piz, P23, ..., Ра n denote the link relatives (or average of 

) without the factor 100, then the indices on fixed base are obtained as below: 

Po, = Po, 

Pog = Fo x Pi 

Foy = Py x P x Poy 


PEU iei Ta pots eg 


Pon = Poy X Pha X Pag XX Pas 


link relatives are in percentages, the -product considered pair-wise, is divided by 100. Tt is 
me to note that this chain index was originally suggested іп 1887 by Alfred Marshall 
1224). 
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The chain index method has several advantages. They are: \ 
i) The chain method provides * direct comparison between each year and the preceding 
itis in such terms that a businessman often thinks. 
ii) Тһе chain base method allows the addition of new commodities, removal of old co 
_ or the substitution of one commodity for another. 
iii) It is possible to change the geographical coverage or the weight of a commodity to 
changing conditions. 

iv) — Itsatisfies the so-called circular test (to be explained later). 

v) Апіпдех with a fixed period can be computed by the product of link relatives. 


It suffers, however, from the following disadvantages: 


1) The computational procedure of chain indices is relatively cumbersome. 

ii) If an error is committed during the changing process, it will be carried through the 
Series, 

iii) Тһе changes in two years separated by a long interval с be compared. 

Since the merits of the chain index Gutweigh its demerits, 54 erefore considered a more 


index. (44) 

5.2.3 Selection of Average. The next step invol choice of an appropriate average to 
single index number for each year. For this purpose, ап following averages may be used: 

(а) The arithmetic mean, (b) the median the geometric mean, 


The advantages and disadvantages нете in the construction of index nurabers аге 
below: 


The Arithmetic Mean. It is ily understood and is easier to compute. It is а) 
mathematical treatment, i.e. the f subgroups can be averaged to find the mean of all the 
disadvantages of the mean are t is greatly affected by extreme values; it gives too much 
increasing prices and too littl, decreasing ones, i.e is biased upward. Moreover, the mean of 
is not reversible, i.e. chan base cannot be made without affecting the proportionate change 
index number. - i E 


The Median. It is easy to understand as well as to compute. It is less affected by exireme 
than the mean and does nol overemphasize increases. The defects of the median are that it 
amenable to algebraic treatment, i.e. the medians of sub-groups cannot be averaged ta obtain the 
of all the data, and the median of relatives is not reversible. 


The Geometric Mean. The geometric mean is a suitable average for ratios as it gives 
importance to equal ratios of change. It makes possible to replace the commodities which have 
be representative by those which have become representative. The geometric mean of ге 
reversible and hence we can change a series of index numbers with any year as base to any other 
the series as base by dividing each index number of the series by the index number of the new time 
selected for base, But it has two disadvantages. It i$ an unfamiliar type of average and it i 
considerable computational labour. 


Theoretically, the geometric mean is the most suitable average but, in practice, the агі 
is generally employed, 
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52.4 Selection of Appropriate Weights, The last important step in the construction of wholesale 
dex numbers is to decide how to select weights which would indicate the relative importance of 
ous commodities in the group. It is evident that all the commodities selected are not equally 
ши. For example, eggs and coffee cannot be given the same importance as wheat and rice. Wheat 
more important than coffee, it is therefore desirable that wheat must be given more importance. 
єт this requirement of taking into account the relative importance of each commodity, a sample 
» should be conducted. Оп the basis of this survey, each commodity should be assigned a multiplier 
expresses more or less adequately its relative importance in the group. Such a multiplier is 
called a weight. The weights could be either a set of suitable numbers adding to 100, or the 
es of the various commodities actually consumed, produced or sold, or their money values. It is 
y to apply quantity weights when dealing with prices themselves and value weights when using 
ze relatives. The quantity or value weights may relate to the base period or to the given period. 
Ње base period quantity (go) is used as weight, the price index number, Рр, for the year п is 
by the formula 
: У рп 40 


ео. 
X Po do 


On = 


= called the base-year weighting. With the given period аши, ) as weight, the corresponding 
becomes 


= also known as the given or current year wei; Жу; It is relevant to note that different systems of 
ag would not generally lead to identical au 


"NWEIGHTED INDEX NUMBESO) 

xam indices are Mudo oa into Simple Aggregative indices and Simple Average of 
£3. Simple Aggregati lex 15 one that indicates the percentage change in the aggregate 
of a number of co 406, (сау А) at different periods. It is obtained by dividing the sum of the 


year prices of all co: ties by the sum of the base year prices of the same commodities and 
пг the result as a percentage. Symbelically, we have 


В, denotes the price index for the given year z relative tc the base year 0, 
Е p, denotes the sum of prices for the given year, and 
ро denotes the sum of prices for the base year. 


А simple aggregative index is easy to understand as well as to apply but it has a disadvantage of 

aag into account the relative importance of the various commodities. Moreover, the units of prices 

«ent commodities being different, influence the price index, which then becomes an inappropriate 
That is why it is seldom used in practice. 
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5.3.2 Simple Average of Relatives. A simple average of price relatives is an index о! 
taking the average of the price relatives of the given commodities for each year and expressing 
as a percentage. If we take the arithmetic mean, then we have 


lS Pn 
Pon Е ize) x100, 


where k denote the number of commodities whose price relatives аге thus combined. 


The simple average of relatives index is superior to simple aggregative index. It suffers 
disadvantage that cach price relative exerts equal influence and gives no consideration to the 
importance of each commodity. Moreover, the use of arithmetic mean, which is not an 
average to use with ratios, results in an upward bias, This sort of drawback may be got rid of by 
geometric average, in which case the formula becomes 


where Ё denotes the number of commodities whose price relatives аң фев averaged. 
The median may also be used for averaging the price re аф . 


Example 5.1 Following are the prices:of a со ty for the ten years ending wil 
Calculate the index numbers with (i) 1948 as a base; а verage of first five years as a base. 


Year 1948 1949. 1950 1951 1953 1954 1955 1956 1957 


PriceinRs. 525 587 612 5580.25 6.62 675 712 650 7.50 
New (P.U., В.А. (Hons. in Ecom 
Let po denote base year's price ze. the given year's price. Then the prices are coi 
price relatives or price indices by the la 


Price relative for a given 
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|баю | — 125 | 1,5 | 


Compute a simple aggregative price index number for the year 1958 based on 1957 prices. 
- = (P.U., M.A. Econ. 1969-5) 
The simple aggregative price index number is given by 
Py, = 229х100 
Epo Ф 
.335.00 +32.00 +...+1.35 384.21 
жеш ЕУ уу. 2 100 37 


7 351.00+35.00 + ...+1.25 7 401.50 Ke 
The simple aggregative price index for 1958 on the of 1957 is 95.7. This means that the 
Deve decreased by 4.3%. N 


Example 5.3 Compute price index for the 26/08 based on 1957 prices, using the simple 
of relatives method for the data in Expl" 


The computation of the simple averag rice relatives index involves the calculation of price 
for 1958 by dividing the prices inya y the corresponding prices in 1957. The price relatives 
below: 


indicates an increase in prices from 1957 tó 1958 by 6.5%. 


It should be noted that the index numbers obtained by the two methods show consi 
difference because the simple aggregative index has been affected by the units of prices 
commodities, and although the influence due to different units has been eliminated by the simple a 

. of relatives, commodities such as salt, sugar, cloth, etc. have exercised an influence out of all pi 
of their economic importance. 


Example 5.4 From the data given below, compute the index numbers of prices, taking 
basc. Use (i) simple average of price relatives and (ii) the median of price relatives. 


Price Relatives 


lab EO un i Gi) Median 


IE x10-257.,5 КФ 
k ро] 4 S 
Пе other entries азып computed in a similar way. 


The price index абста appearing in the last column of the table, have been obtained by 
„Ње medians of the price relatives. 


Example 5,5 Construct chain indices for the following years, taking 1940 as the base and 
the simple average of relatives. 


” (P.U., B.A. (Hons. in Есоп.), 
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we calculate the link relatives by the formula 
p ____ Price of a commodity іп the given year 
п-п ~ Price of the commodity in the preceding year 
таға y 100 
Рп-\ 


Next, we multiply the averages of link relatives successively (taking two at a time) and divide the 
by 100 to chain the relatives back to the base 1940. The calculations appear in the table below: 


Link Relatives Sum of Simple Ж-. 
average of Chain Indices 
relatives 
300 100 


Year 
| Rice | Maize | 
100 100 100 
100x114 


7100 
114x104 ` 


100 
118.6x108 


100 
128.1x105 


100 


x 100 


- 114 


- 118.6 


= 128.1 


= 134.5 


the Chain Index Numbers are 100, 114, 186,42 


Example 5.6 Find the chain indices from 


Ж llowing price relatives of four commodities, using 


| 96 | 
1995-6] 43 | 165 | 88 


First we calculate the link relatives. The link relatives for commodity A (for "esie are 
below: 
Link relative for 1992 - 31100-76 
А 104 
Link relative for 1993 mi 9 = 168 


9 
Link relative for 1994 EL 89 


Link relative for 1995 =F x100 = 65 
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The link relatives for other commodities are obtained in a similar way. The chain. indices, i 
geometric mean are then computed and shown in the table below: 


~ Calculation of chain indices, 2.2. the m mean of the relatives 


79.9 


"79:9 х 96.2 
100 
= 76.9 
76.9 х130.2 
100 
7 100.1 
100.1x 100.6 
100 
= 100.7 
100,7 x 77.9 
100 
-784 


Hence the chain indices Бу geometric gm аге 79.9, 76.9, 100.1, 100.7 and 78.4. 


5.4 WEIGHTED INDEX ERS 


An index number t ures the change in the prices of a group of commodities whe 
relative importance of thQwommodities (je. weight) has been taken into account, is called a we 
price index number. Weighted indices are generally divided into Weighted Aggregative indices 
Weighted Average of relatives indices. 


5.4.1 Weighted Aggregative Price Index Numbers. An index is called a weighted aggre 
index when it is constructed for an aggregate of items (prices) that have been weighted in some 
corresponding quantities produced, consumed or sold) so as to reflect their importance. There аге ж 
kinds of weighted aggregative index numbers, some of them are discussed below: 


(a)  Laspeyres' Price Index, advocated by the German economist Е’ tienne Í 
(1834—1913) in 1864, is defined as 


EPnto 100 
On X poto 
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This is the percentage ratio of the aggregate of the given period prices weighted by the quantities 
|, consumed. or sold in the base period to the aggregate of base period prices weighted by the base 
quantities. The index represents the relative cost in different years of purchasing the base year 
ities of various commodities at the given year price. The advantage of Laspeyres formula is that the 
weights remain unchanged for the subsequent periods and only information on latest prices need 
stained. It has, however, a few limitations. The index value obtained by this formula gets somewhat 
as the series move away from the base period. That is why an index of Laspeyres type is said to ” 
an upward bias. It does not satisfy the time-reversal test or the factor-reversal test. (discussed later) | 


(b) Paasche’s Price Index, proposed in 1874 by the German economist Herman Paasche 
1-1925 y is given by the relation x 
У. рп» ууд 


p= 
On" X podn 


This is the percentage ratio of the aggregate of given period prices weighed by the quantities 
consumed or sold in the given period to (һе aggregate of base period prices weighted by the, 
period quantities. It represents the relative cost in different years of purchasing the given year 
žes of various commodities at the given year price. Тһе computatigy of this type of index needs 
data on quantities (ie. weights) and to obtain such dirmi nante for each given year, 

enquiry or a statistical survey, which normally involves со! iderable time and finances, would be 
every year and this 15 а difficult task. This index has а di ward bias, i.e. it deflates the index 
of distant periods of time. It also does not obey the timegersal test or the factor-reversal test. 


it) Marshall-Edgeworth Price Index. This ind independently by the two English 
ists Alfred Marshall (1842-1924) and Е.Ү. rth (1845-1926). Here the weights are taken. 
wrerage of the respective quantities in the b; Operiod and in the given period. This is, so to say, a 

ise’ solution, although it is the desees has no general bias in either direction. Since 


-5,) lies between 49 and g,, ж arshall-Edgeworth's index number lies between the 
* and Paasche's index ied formula for this index is 

Xpn(qo +9п) 
x X po(qo * 4n) 
4 Fisher's “Ideal” к. Fisher's ideal index number, named after its inventor, Irving Fisher 


1947). is the geometric mean of the Laspeyres and Paasche type of index numbers. Symbolically, it 
by 


On” x100 


Рл (Fisher)= Fy, (Laspeyres) x Fon (Paasche) 


> —РпФу „ pi 2. Рпӯп. 


x100. 
YE oso “Epon ^ 

called it "idea?" index because it meets certain theoretical tests of quality which he 
appropriate for a good index number. It is sometimes known as crossed-weighr formula 
is the result of geometrically crossing (averaging) two index numbers with different systems of 
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Fisher's ideal index has а theoretical advantage over other index numbers as it is the only 
that obeys both the time-reversal test and the factor-reversal test. It suffers, however from the fc 
disadvantages: TE 

i) Тһе /deal index number being the geometric mean of the Laspeyres index that has an 
bias and the Paasche index which suffers from a downward bias, is considered to give 
result. But we аге not sure about the elimination of the biases as “the average of two 
answers does not necessarily give one right answer." 


ii) Itis difficult to say specifically what the ideal index number measures because of its 
hybrid of two index numbers. 
in) The computation of the ideal index is relatively difficult and laborious. 


iv) Its computation needs information on quantities (consumed, produced or sold) for each 
period. To get such information, a fresh шуу which may be too costly or 
would be needed every year. 


(e) Walsh Index. Walsh advocated that the weights should geometric mean of the 
given period quantities instead of taking their simple average. 5 cally, it is given as 


e 
Py, = Lp, 909» Yd 3, 5% 
E poya 


There is another formula known as the Lowe уи number in which average weights are 


Lr 2 
S D 


where q is obtained by эйе of several years. 


Example 5.7 From d 


152 table, compute the weighted aggregative price index for 
1999 on the basis of the уе; 95. 


Average йау, дезе ргїсез 
consumption per SE ; 


Теа 


Vegetable ghee 


Eggs 
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To compute the weighted aggregative price index for the year 1999 with Sce as base, we need 
Dp, q, and È p, gy. The calculations appear below: 


Commodity 1995 1995 1999 Pogo " 
. E ) |` (ро) (Рл) 


Vegetable ghee 
= 


Ep, d 326.75 
se Py = 3 x100 = — x100 = 128.77 
X pp 49 25375 


The weighted aggregative price index for 1999 on the basis of? «$, 128.77. This means that the 
have increased by 28.77%. 


Example 5.8 Construct the following weighted m. price index numbers for 2000 and 
Бот the given data. N 


Laspeyres” index, (ii TAS ге b her's "Me "index. 


sree какшы lie 
270 | 276 
124 | ns 


130 121 
185 267 


[бы] iy | a let (|= [= D [= 


ұлық 20250 % - puso 18560 | 20700 жегу 

ч m 18 5580 5760 | 5310 
a ae xis 178 2466 | 2541 T 
келі з е 15486 | 20590 | 18156 | 19880 


Гата [ne Гава 4r [4617 [54] 


ETUR index numbers: 
ЕР, 199 241140 
E Polo 35310 


È P24 x 100 = 39644 
X ро, 35310 


Index for 2000: ( £,,) = x100 =116.5 


Index for 2001 (Р,)- ШЫ «100 = 112.3 
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(1) Paasche’s index numbers: 


Index for 2000 (B, y= = 24i x 100 = 26707 (15.1168 
Sd . 40048 


2 p.d; (499 51724 


Index for 2001 (Р,)т 
| бы) X Pod; 47376 


x100 2109.2 


(ій) - Fisher's “Ideal” index numbers: ' 


Index for 2000 (Р,) = È Pigo „Ўра уо 
È Polo È Pod 


41140 46707 
-jas x100 
35310 " 40048 " SS 
d 
SN ЕТІГІН pese: 


Index for 2001 (Py) = DES = х100 


zng Pod; 


gua. $1724 100 
RS 5310 47376 


a. 123 x 1.092 x100 - 110.7 


Example 5.9 The prices and quantities of three commodities during 1990 and 1994 are 


below: 


Usi im ud 
RUNE RE NN NN 


Compute the Marshall-Edgeworth and the Walsh's price index numbers for 1994, using 
the base period. 
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First we calculate the necessary products with 1990 as base year. These calculations are shown in 
table below: 


Computation of Weighted E Price Indices 


pensa % PO 


29,111 85,471.75 79,438.45 42,705.28 | 39,690.78 
6, 269. 34 5, 602.80 : ; 3,134.67 2.801.40 
13,969.80 14,405.04 6.984.90 7.202.52 


the Marshall-Edgeworth price index = ТЯ 100 
X Poldo +4) 
_ 105710. 89 
——х100=106. 
7 99446. ns a 
and the Walsh price index — = Palast. атм 
Ep daa - 
pont. 2 00 2106.3. 
S 
$42 Weighted Average of Relativ: is Number. It is computed by multiplying each 
ive by its weight, summing these ts and dividing by the sum of the weights. The weights 


the total values of the co ШЕ. The important types of the weighted average of relatives 
ces are given below: ON 45, 


S° 
^3) Laspeyres, index п 
S 
_ Жр, / Po) Podo. 109 
On — 
Урф | 
Же price relatives аге weighted by the tota! value of commodities in the base year. This is | 


X Pado 
Ро% 


to Laspeyres weighted aggregative price index, i.e. X100. In other words, these 
эге alternative ways of getting the same result. 


Paasche’s index number is 
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` where the price relatives аге weighted by the toral value of commodities in the given year at b 
prices. This is also identical with Paasche's weighted aggregative price index, i.e. 2 Pad .x 100. 
NEA Pod, 


(c) Palgrave's index number is 


/ 
( 1 б Pe = Хр, Po) Pon x 100, 
\ с У р,9. У 
where ће price relatives are weighted by the total value of commodities in the given year. 
In all these index numbers, the computational labour is reduced if the weights are made 


unity, For example, if №, = = xe , then the Laspeyres formula becomes 
odo - 


Р, = 32 [x100 LE и, =1) 
Po қ” 


The advantages of this procedure is that it indica many points each commodity 
to the index number each year. = 


Example 5.10 Using the data of Examp ; compute the weighted average of relatives: 
onion members ЕТУ ене иаи ) Paasche's method, and (iii) Palgrave's meth 
Ў У, 


omputation оў е ҚИЯ Average of Relatives Price Indice: 
[ Price — | Т 
e | «баада 
А 425 
в 38.94 
[ч 59.70 7046. 


Price Relatives: 


БІРДЕ: 
4335509 " 
3232.12 
6926.73 - 
ЕЕГ 


3037.41 


= 
10.43 \ 10% | 38216.25 | 4122220 | 4435300 
5% мө | 2714.40 | 289940 | 323202 
AN 16 0.970 726408 | 714096 | 6925.20 16 
| 2] =] eges | suss | өзе | ess | 
Thus (i) Laspeyres' index for 1994 = Xp! Po) Pode LI Po) Pods x100 


2112068 
Pods 
_ 51,204.25 


= x100 = 106.2, 
48,194.73 


Gi) Paasche's index for 1994 = PL Po) Poti x 199 
2 Poth 


_ $4,513.94 


= x100 —106.4, and 
51,251.56 
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<р, Í Po) Pidi x100 
È pg, 

_ 58,057.90 
. 54,510.22 


(iii) Palgrave's index for 1994 = 
x100 2106.5. 


QUANTITY INDEX NUMBERS. 


They are intended to measure the changes in the physical volume or quantity бы consumed 
sold of certain goods or services with Tn to time. Like a price vw ie we define a quantity 
түе by the ratio 


quantity relative = = de x 100, 


do 
g, denotes the quantity of a commodity in the given period and 4 denotes the corresponding 
sty in the base period and which measures the proportionate с in quantity. The quantity index 


formulas are obtained by interchanging p's and q's in the wei, price index number formulas. 
, the pure Weighted Aggregative Quantity не by 
-Хаһ аа” 
Oon 
| Eon 
prices of the base year are kept fixed as sey The Paasche's Weighed Aggregative Quantity 


5 given as әр 
А 2.41 РА, 


Оут Yeh. 


prices of the given year are ex d as weights. Similarly, we define the Fisher's Quantity Index 


relation 
ж” 
ХУ: = PERS 24Р, 100. 
^" Хар Xp, 


Other formulas are also defined and interpreted in a similar way. As the prices are held constant, 
s are attributed to changes in quantity. These index numbers are also called the volume 
rs or Quantum index numbers. 


When the price p of a commodity during a period is multiplied by its quantity 4, produced, 
or sold during the period, we get the total value, by pq or v. A value relative is then defined as 


Total value during given year 


value relative — x100 


Total value during base year 


Pods X100 = 29-5100, 


Ро4о ka 


о =, = 
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It is interesting to note that (omitting the factor 100) a value relative is equal to the product of 
price relative by the quantity relative. Symbolically, this may be written as 2—7 


Р.Я. 


value relative = 


ip 
Po )\4o 
= Price relative x Quantity relative 
In а similar way, a weighted relative quantity-index is defined as 


where (4,/40) are the quantity qt and w denotes the weights. 


Example 5.11 Construct (1) а — type (ii) a Pani s type and (iii) Fisher's Jdeal 
quantity index numbers from the following data: 


——— MEE. NN 
РЕЗ ој ЕНЕН 
3 
5 К; "350 
Dek 340 3 
50 00 120 І 
LU. 


The necess иу peus are Му 


522 Qu Де 5230 5250 3200 
0 Bio 6120 | 2000 500 
1306 a 3600 500 
8 


13710 800 | 11000 | 15400 | 23000 | 
(i) Laspeyres’ type: 
710 
.. Quantity index for 1959 = Ха. 1719 100-1758 
YXqoPo 00 
16370 
Quantity index for 1960 = 72Р. үуу = 16370 . 199 209.9 
У ФРро 7800 
(ii) Paasche's type: 
400 
, Quantity index for 1959 = 2411x199 15400, 100 155 9 
Уор; 800 
э 0 5 
Lab. 23000 100 = 209.1 


x100 = 


Quantity index for 1960 = 
Учар 
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Fisher's Ideal type: 
Qu У фро x Lap x100 
У 40Ро . Хори 
13710 
= 1399. 00 = 1754 
7800 8800 


im Х4:Ро ip х100 


16370 Б 23000 


.x 100 = 209.5 
7800 11000 “ 


Example 5.12 The following series shows for U.K. total imports (a) the declared value, and (b) 
on the basis of average values in 1930. ў 


U.K. Total. Imports ; o 


Taking 1930 as base year, t index numbers (1) of average values and (ii) of volume for the 
1931-1936. ІК (P.U., M-A., 1961; B.A./B.Sc., 1969; Р.С.5. 1971) 


Ж is easy to obtain (i) Оқ of average valués for each year and (ii) а volume index, by writing 
in columns 2 and 3 in symbols first as below: 


Declared value Value on basis of 
1 2 1930 values (3 
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(i) Index numbers of average values are thus obtained by dividing the entry in column 
year by corresponding entry in column 3, i.e. by Paasche's formula. Thus 


the average value (price) index for 1930 = È Pod x100 
4 Р, 
= а х100 =100, 
1044 
and average value index for 1931 = EP ху 
Pod 
= = pice bh x 100 = 80.7. 
1069 
Similarly, the average valut indices for 1932-36 are obtained as 74.8, 71.4, 73.8, 7 
78.7. 
(i)... Index numbers-of volumes are obtained by dividing — in column 3 by Xp, 
by Laspeyres' method. e 


4) 
Thus the volume index for 1930 = Pot хе) 


Рио 


104907 
= ps 100- 10 
Ss: 


and the volume index for ee Enn x100 
Ро% 


д? 
E -12 x100= 102.4 


Proceeding in a T way, we obtain the volume Sidi numbers for 1932—36 as 89.9, 90. 
96.9 and 103.2. 


Calculate (i) the value index number; 
(ii) Laspeyres' weighted aggregative price index; 


(iii) Paasche's weighted average of relatives volume index. 
(B.Z.U., M.A. Econ. й 
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24649.42 
. 6808.88 

4178.75 
21637.25 21637.45 


[ X [оэ | 5727475 [4620975 [612860] -| 572450 | 


БЮ Тһе value index number is; 


A = EP (199 25727475 100 2116:87 
X p.d; 49006.9 


The Laspeyres' weighted aggregative price index is ' 


È PG x100 = 46209.65 (о 94.29; 
X3 49006.9 


Тһе Paasche's weighted average of relatives volume index is & 


р. = 7% '4)%Р\ 199 = 5727450 d 
u^ TY B 46209.6 


9 
FOR INDEX NUMBER FORMU! S 
om a theoretical view point, a “good” ре formula is required to satisfy the following 
by Irving Fisher (1867—1947). 
551 Time Reversal Test. This may VÀ as follows: 


“SF the time subscripts 0 and л аду terchanged in a price (or quantity) index number formula 
they appear, then the res price (or quantity) formula should be the reciprocal of the 
mdex formula, ignoring r 100." Symbolically, the test requires that 


NR 


On 


BE = 


-— or, x Po =1 
nd 


the index number is designed to measure changing values of prices or production. it is 
=æ expect that the formula should give the same result regardless of which the two periods is 
æ the base. The base period and the given period are only relative terms and hence should be 
An index number satisfying this test gives consistent results. This iz a property we shall desire 
ю obcy. 

the purposes of illustration, let us consider some of the important formulae and see whether 
до not satisfy this test. 

The Laspeyres’ price index expressed in ratio rather than percentage form (i.e. omitting the 
factor 100), is given by 

P, = <р, 
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p,-LP4. 
Y p.4, 
But Bx Pos ie = Pate A * #1. 
. Хр Xp. 
' Hence it does not satisfy the time reversal test. 
Gi) ^ Again, taking the Paasche’s price index formula, ie. P., $253 and intercha 
% Dia 7 
Ps BA / 
У р,4, 


Y p,g, Урд 


ee cora 
OTa "10 


#1, ма“. зн that this formula, too, 
9 
обеу the time reversal test. 


4%) 
(ii) Ішетсһаһвіпр the time subscripts in the Mags@=ll-Edgeworth price index, i.e. іп the 


Y р„(4, * q,) do 
P, = 2A а-у, 
Ep(«*9) ^ SS 
P - Ep, 9 
А 3 = Ур e 4) 
Multiplying together, we obf 


P xP К (% +4„) „ У Po(Gn +9) у. 
eS Ep(q *4,) Хр.(4,%%) 
Hence the time W¥ersal test is satisfied. 
(iv) Fisher’s Ideal index number is 


P= (Ера, „Ера, 

Ур Ура, 
Interchanging the time subscripts, we һауе 
Èpo, Xp, 

Py = J x ee 

= (Er. У р, 


Thus) P, x Py = |2240; EPA. „ |У рой, „> Pode ү 
Ура Ура, YXP,q, Хр,Ф 
Hence it conforms to ће time reversal test. 
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5.6.2 Factor Reversal Test. It may be stated in this way: 


“If the factors p's (prices) and 475 (quantities) occurring іп a price (or quantity) index formula be 
ged (or reversed) so that a quantity (or price) index formula is obtained, then the product of the 
x numbers should equal the value index number, í.e. Ep. ". In other words, the factor 
Ро4о 


test requires that 

(Price index) (Quantity index) = Value index, 
the following illustrations. 

Interchanging the factors p's and q's in the formula 


= > Р,4о 
Ур, 


О), = Ep (Laspeyres* quantity index) 
Хр 


9 
The product of these two is EPA у АДЫ, wich i not equal? 1 


Ер, g> o 


Хр.4. . e 
22. the value index. 45% 


the factor-reversal test is not obeyed. ^» 2 
Again interchanging the factors p's ops. the formula 
p, EB... 
У; р 94, N 
transforms into О), 3 A», (Paasche's quantity index) 
ж 0 р, п 
Thus P,,xQ,, = A Ep. ж Ep. (Value index). 
LPG Ха, XP 
Hence Paasche's price index does not satisfy the factor reversal test. 
The Marshall-Edgeworth’s price index is 
_Хр.(9%+9,) 
> Р oldo + 4, ) 
Iuterchanging the factors p's and 475, the quantity index is obtained as 


0i (Laspeyres’ price index), we obtain 


price index) 


On 


0, e. Ха,(р%р,) 


№ Xa, * p,)- 


Ep. (a, *4.) x 24 * P.) , Ўр, 
Ep(4 4) XqG,*Pp,) "Ұра 
ie. (price index) (quantity index) # value index. 
Thus the factor-reversal test is not satisfied. 
(іу) By definition, Fisher’s price index is given by 


p= [Epa d, x 2Р.4. 

о 

А Y p.d; EE 

Firs quii indes bil ty -————— ШАН 


Q = 2.4, Po 24Р, | 
ч \ Харь" Yap. 


But 


Therefore B, xQ, = ХР, = Pade. у ELA 
hes ue Yan xA 
= value ded 
Low: EN 
Hence, we see that Fisher's Ideal index satisfies reversal test. 


айке 5.14 Show with the help ont th&vZollowing data that the factor-reversal and tim: 
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paz [Z pa Ура = 1900 1880 
0i acm sed ананы ро DAA? 
Z Podo Уро 1440 1464 
"s quantity index for the current year (omitting the factor 100) is given by 
"e [Харь Хар _ [1464 1880 
Qu = Хар. Жар, «TAS ке, 
ҮХарь Хр, 11440 1900 
- 1900 1880 |1464 1880 
вхо, = 190, 1880, р 
1440 1464 11440 1900 
" [1880 1880 . 1880 Ур 
1440 1440 1440 Xp. 
Zane idet. 


Bence we see that Fisher's /deal formula for index numbers валове factor-reversal test. 
% езіп interchanging the time subscripts in Fisher's price index @hiitting the factor 100), we get 


Хра, ХР S 
By [End Рой 
Ұра Ура E 


N 


Pa is the price алеу the year ‘b’ with year ‘a’ as base, Phe is the price index for the year “с” 
© as base and P,,-is the price index for the year ‘a’ with year “с” as base, then thé product of 
eumbers should equal 1. In other words, the circular test requires 

i P, xP, xP, =} c 


the circular test is said to be satisfied, if 
By x By x Pa X x Faux Be nl. 


ziscates the price index (without the factor 100) for the year ‘j’ with the year ‘i’ as the base. 


“жі is not obeyed by any of the weighted index numbers unless the weights are constant. For 
of illustration, we consider the weighted aggregate price number with fixed weights, say, 


be three years denoted by ‘a’, ‘b’, and “с”. 
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Then the weighted price index numbers are 


= DXX k= 2 Dido and Pe = È Pado 7 
Хр,” У рь X Pila 


Multiplying them together, we get 


Ep , UPI , ХР _ 
ра Ўра LP 


which shows that the circular test is satisfied. 


5.7 CONSUMER PRICE INDEX NUMBER 


5,71 Meaning. A consumer price index (CPI) is designed to measure the changes % 
composite price of a specified "basket" of goods and services during the given period as compare 
the base period. The so-called "basket" would comprise various commodities consumed and чё 
received in the base period or the given period, grouped under the headings: (i) Food and bi 
(ii) Clothing and footwear, (iii) Fuel and lighting (iv) Housinggyy) Services, (vi) Miscellaneo 
customary to exclude the durable goods and поп-со! monetary transactions such 
contribution to Provident Fund, Savings Certificates, etc. quantities consumed or the expe 
incurred on various groups are used as weights for те таре retail prices prevailing in the 
concerned during the base and given periods. 

Some countries still retain the name of the! of living index number as "it measures the & 
in the cost of living of a person or of a gro ons, having identical tastes for goods." A co 
price index is also called a household-bud; ce index ora retail price index. | 


РухРкР = 


5.7.2 Construction of Сопви се Index Numbers. The following steps are involved 
compilation of the consumer price i штЬег: 


(i) ^ Scope. The first v clearly specify the category of people and the locality whe 
reside as a cons rice index number relates to a particular segment of: population 
low-salaried сез, school teachers, industrial workers, etc. residing in а pa 
defined area as a city or an industrial town. As far possible, a homogeneous gs 
person, i.e. persons who have identical patterns of fiving, is considered. 


(ii) ^ Household Budget Inquiry and Allocation of weights. The next step is to ce 
household budget inquiry of the category of people concerned in order to determine the 
and services to be included in the construction and to derive weights to be attached 
This step has many practical problems as no two households have the same incom 
and the same purchasing or consuming patterns. The inquiry or the house hold cons 
survey should, therefore, include questions on family size, income, number of cam 
quality and quantities of goods and services consumed and the money spent on the 
various headings such as food and beverages, clothing and footwear, fuel and 2 
housing, and miscellaneous. The miscellaneous group includes items such as com 
education, medical care, recreation, gifts, newspaper, barber, laundry and other 
charges. 
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An appropriate sampling technique is to be employed to collect consumption data of the 
145. The factory payrolls in case of industrial workers or the ration register in other cases may be 
as frame. The households, after having been stratified into different income groups, are drawn into 
sample by the method of either area sampling or systematic sampling with random start. The 
tion data collected are then analysed according to the nature, quality and quantities of the 
d commodities, the proportion which expenditure.on each group bears to the total expenditure of 
groups: The weights are then attached to the various consumption groups in proportion to the money- 
on them so as to truly reflect their relative importance. 


Piece Data. The third step is to collect data on consumer prices of goods and services 
included in the basket, These prices (retail prices) should be obtained both for the base period 
and the given period from the locality in which the people concerned reside or from which 
they make their purchases. 


Computation of the Index. The last step is the computation of the consumer price index 
number with the help of an appropriate formula. For this purpose, опе of the following two 
methods is employed with the same result. 

The Aggregate Expenditure Method. Here the quantities Snes by households in the base 
жге taken as the weights. The quantity У 2,90 represents зня expenditure incurred on ће 


items in the base year and the quantity 2. Pado the given year with base year quantities. 
Laspeyres' formula, namely S? 


DE = ло 
. odo | 


applied. The given year quantities ақ оі used as weights because they change from year to year 
з fresh sample budget enquiry, Өз» considerable expense, labour and time, would be 
every year. Ж? 
Household Budget Ме this method, the price relatives are weighted by either the money 
5y the households on «Notis items or fixed weights derived from the sample household budget 
у conducted in the category of people concerned. This method is called the Household Budget 
{ because the amounts of money spent by the households concerned are obtained from а 
consumption survey. This method is also known as the Weighted Average of Relatives. The 
followed is 


от. 


Ж may be Pog, and IP x100. 
0 


of computation is illustrated by the following example. 
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Example 5.15 The following table gives average annual prices of ten commodities 


years 1990 and 1994. Calculate — 


Fuel 
Cloth 
House Rent 


лы ]- |- 1 -]- Гзта алво 


Thus the consumer price index number for 1994 is 


Р = 


xit mmber for 1922 on the basis ОРГУ 


20 icm 
lOkgs . 
12 kgs 

4 kgs 

3 kgs 

30 litres 
35 kgs 
200 kgs 
22 meters 
] unit 


30 litres 
35 kg 
200 kg 
22 meters 
i unit 


x 428.70 
Рудо х100 = 
X род, 297.25 


x100 = 144.2 


This result indicates that the prices of consumption goods have increased by 44.2% for 1 


comparison with 1990. 
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(i) ^ Consumer price index number by the Household Budget Method. 
Price ^ 
Comm | Priceperunit | ушы | Weighs | Weighs 
Commodity | Unit’ | consumed ‹ ыла Э 
Wxl 
3750.00 
3750.00: 
3600.00 
1300.00 
1200.04 
2250.00 
_ 1400.00 
15000.00 
4620.00 
200.00 6000.00 
L | =] 29725 | 4287004] 
Hence the consumer price index number for 1994 is > 
ағы C N Я 
01 EW © 
o 
870. 
„4287004 д, P 
29725 AS 
Же see that the consumer price index constructed by both methods are the same. 
the process of rounding off the fi result in a small difference between the results. 
£713 Shortcomings or Drawba: Consumer Price Index Numbers. Some of the 
” of the consumer price index are given below: 
It is practically difficult to оҚаЙу demarcate one category of people from another. 
As the construction umer price indices involves the sampling of goods and services, 


the sampling err: їазез may affect indices and render them to suspect. Moreover, the 
frames used for houSehold consumption inquiry may be incomplete and outdated. : 


In case of certain goods, it is difficult to collect prices actually needed. For example, the prices 
for clothing usually relate to c/oth and not to tailored clothes. 


It is also difficult to eliminate the effect of changes in quality and grade of goods and services 
purchased by households. 


During the course of household budget inquiry, the price of goods and services and their 
éemand may change; some commodities may change іп quality, others may disappear and 
some new goods may enter the market. ў 

The consumer price indices cannot be used for comparing ће price changes in consumption 
goods and services in two localities or in two households in the same locality as no two 
*"euseholds сап be homogeneous, i.e. they can neither have precisely the same pattern of 
esasumption nor precisely the same basket of goods and services. 
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It is therefore relevant to point out that a consumer price index should not be wholly relied upon 


it is an imperfect measure. | 


5,8 USES OF INDEX NUMBERS 
А few uses of index numbers are given below: 


i) The price index numbers are used to measure changes in a particular group of prices and 
us'in comparing the movement in prices of one commodity with another. They аге 
desigfied to measure the changes in the purchasing power of money. 


ii) Index numbers of industrial production provide a-measure of Dee in the level of i 
production in a country. 


iii) The quantity index numbers show the rise or fall in the volume of тосе voli 
exports and imports, etc. 


iv) Тһе import and export price indices are used to measure the changes in the terms of trade 
country. By the terms of trade is meant the ratio of impor export prices. 


v) Index numbers are also used to forecast businessk¢Stditions of a country and to 
seasonal fluctuations and business cycles. 


vi) The consumer price indices indicate the сыф. in retail prices of consumption р, 
services, These movements in prices vernment in formulating its policies and in 

be used to re-adjust the wages and to 
allowance and bonus to their employees to 
commercial estahlishments as well as mills. 
product and wages to arrive at the rea/ values of the 


appropriate economic measures. 
measures of relief by granting 
increased costs by the industri 
used to deflate the gross nati 
product and real wages. ew 

vii) Index numbers are seed to measure enrolment changes, intelligence quotients 
performance of $ 


5.9 LIMITATIONS e INDEX NUMBERS 


Some of the limitations are described below: 


i) Itis not practicable to price all the goods and services as well as to take into 
changes in quantity or product. 


Ш) Since the construction of almost all index numbers is based on sampling of 
therefore the sampling errors creep into their calculations. 


iii) In price indices the choice of a normal period is difficult as few periods can be 
normal for all segments of the economy. 


iv) Тһе results obtained by different methods of construction may not quite agree. 
v) Comparisons of changes in variables over long periods are not reliable. 


vi) All index numbers are not suitable for all purposes. The users are therefore étrongly 
essentially understand the purpose for which the index has been constructed. 
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EXERCISES 


OBJECTIVE 


22 Answer ‘True’ or ‘False’. If the statement is not true then replace the underlined words with words 
that make the statement true: } 


i) Weiglited aggregate index is the simplest form of an index number. 


п) When the base year values are used as weights, the weighted average of relatives price index 
is the same as Paasche's. 
ті) There аге three methods of construction of CPI numbers 


Fisher's ideal index ig the mean of Laspeyre's and Paasche's index numbers. 


The most suitable average for index numbers is median. 


A Laspeyres price index is a CPI. Se 


If a price index increased from 150 to 200 over a cer Sod, the increase in prices was 
50% from the beginning to the end of that period. N) m 


If the current year and base year index numbers and 160 respectively, then the value 
of Fisher's Ideal index number is 185. AS 

Prices are appropriate weights in a sq е) aggregates quantity index. 

If price and quantity of a ошко year are multiplied, we get value. 


MULTIPLE CHOICE 25 


If the price of a kg of Mas Rs.40/- in 2000 and Rs.50/ in 2002, the simple price relative in 
2002 is к : 
a) 125 + 

b) 100 

c) 80 

d) 50 


An un-weighted aggregates price index has a limitation that 

а) Itis difficult to calculate. 

b) Itisunduly influenced by the price variations of high priced commodities. 
c) It is unduly influenced by the price variations of low priced commodities. ` 
d) None ofthe above. 


ш) 


Б) 


м) 


vii) ; 


viii) 


. à) Current quantities. e 
b) Base period quantities. ev 
c) mrt tiep 
d) None of the above. 
The Paasche’s price index is a тикш ee 
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The best weights to be used in a quantity index calculated by the weighted average of 
methods are: 


а) Base period price weights. 
b) Current period price weights. 
9 Base period quantity weighis. 
'd) «Base period value weights 
The CPI is basically 
a) А fixed-Weight index. 
Ы) A Laspeyres index. 
c) Both of the above. i 
d) None ofthe above. ; 
The Laspeyres price index is a weighted aggregate index ipyyhich the weights are based 


a) Current quantities, = 
b) Вазе period quantities 9) 
с) Mean of base and period quantities. 
4) None of the 4 
The Laspe e index is: 
а) Upward biased. 
b) Downward biased. 
c) No bias. 
d) None ofthe above. 
The following is a price index number series: 1995, 100; 1997, 120; 2002, 150; 
following statement is incorrect? 
/a) Prices increased by 5096 from 1995 to 2002. 
") Prices increased by 30% from 1997 to 2002. 
с) Prices in 1995 were 33196 lower than in 2002. 
d) Prices increased by 2596 from 1997 to 2002. 
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ix) The following is a price index series for Lahore. based on 1990 = 100, 1995 = 120, 
2000 125. Which of the following statement is correct? 


а) Prices have increased by 5% from 1995 to 2000. 
b) Prices in 1990 were 25% lower than in 2000. 

с) Prices іп 2000 were 1.2% higher than іп 1995. 
d): None of the above. 


If wages of а group of workers increased from 1995 to 2000 by 10% 6 ang a relevant price 
index increased by 5%; Real wages have increased over this period by: 


a) 4.8%. 

b) - 10%. = f ь "c 
96€. ў 

d) None of the above. d 
^ Which of the statement is the for IM s index number. 
a) It meets time reversal test. = S 


b) Itmeets factor reversal test. Ы У 
ои meets both time reversal as well as factor eue : = 


а) None ofthe Me 5 2 © 
Which of the statement is ‘true for. т eS 
a) It meets time reversal test. 1 
b) It meets factor reversal test. e 
c) It meets both time. eie as factor reversal tests. 
d) None of the above. e 
А Laspeyres price i 
а) ; A'cost of livingYndex. 
b) A weighted index. 
c) Both of the above. 
d) None of the above. 


In 2000 the price for a certain type of fish was Rs.120/- per kg, and 450 tons were consumed. 
In 2001 the price for this type of fish was Rs.100/- per kg, and 350 tons of fish were 
consumed, If the simple price relative і in 2000 is Rs.100/- then in 2001 simple price relative 
would be 


a) 130 


ху) 


SUBJECTIVE . 


5.1 


52 


53 


54 


5.5 


5.6 


5.7 


- index number. What considerations would you weigh while constructing a whol 
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The index number fora base year is always | 
a) Zero. 

b) Greater than 100 

с) Less than 100. 

d) None of the above. 


Explain the concept of an index number. What is the procedure followed in the 
an index number? 


Define an index number. Discuss the main steps involved in the construction of 
numbers of wholesale prices. Indicate their uses and limitations. 
(C.S.S. 1964; P.U., В.А./В.5с. 1986, 


What is an index number? Describe the important problems involved in the 

index number, in connection with the selection of co and the base year? 
N) (P.U., В.А./В.5с, 

It has been stated that the technique of index numi construction involves four major 

(a) Choice of items (b) Base 

(c) Form of averages (d) Weight system 


Do you agree with this? If so, exp! four factors and discuss the problems 
they give rise. If you do not T your views on the main problems involved й 
construction. S (Р.О. M.A. 


4) "йесін айта да Didinguisk Бес fixed barc aud chsin bars 
constructing index . What are their respective merits and demerits? 
.\ (P.U., B.AJB. 


b) Describe methods of averaging that can be used in constructing 
number, int out their merits and dernerits. (P.U., М.А. Econ. 

a) What is a weighted index number? Describe the various methods of weighting 
numbers of prices. What are their advantages? (P.U., B. 

b) Present an interpretation of the (1) Laspeyres and (ii) Paasche price index 
terms of the total value of commodities. 

Compare the following concepts: 

i) Simple index and Composite index. 

i) Fixed base index and Chain base index. 

iii) Laspeyres' price index and Paasche's price index. 

iv) Weighted aggregative price index and Weighted average of relatives price index 

(P.U., BAB 


INDEX NUMBERS 


https://stat9943.blogspot.com 


165 


38 


a) Define Laspeyres, Paasche and Marshall-Edgeworth types of index numbers. Show that 
the Paasche type is the reciprocal of the Laspeyres type with time subscripts reversed, 
given that the two have the same value. 


b) Define Fisher's /deal index number. Describe its advantages and disadvantages. 
(C.S.S.. 1960; P.U., B.A./B.Sc. 1973) 


Define and discuss the following index numbers; 


i) Quantity Index numbers. - 
ii) Value Index numbers. 


Explain the time and factor reversal tests. 
Which of the following formulae satisfy these tests and which do not? 


Ds > Xp Epon 4 
) 2190 i) Phy eee шуу шш. 
E podo Z poa > Po DENM 
ҚС E pilio 541) ч Урду Xp 
X po(ag + 4) E родо x on 
(PU, и 1994; B.Z.U. M.A., Econ., 1994) 
a) Explain theoretical tests which a good index ted to satisfy. 


49 
E Pra (dra * 4,5) 

Lig ata S the Маза асо? ра Вайт, 
Уру (Gra %4,,) 


reversal test but not the cu less the weights in the three years а, 5, с, are equal. 
(P.U., В.А./В.ӛс. 1977; 1978-5) 


al Р 


а) Prove that ће I ‘aggregate value index numbers 


satisfies the time 


Урлап. 
Eo Ро40 
reversal and Malar tests but do not satisfy the factor reversal test. 


b) Show that weighted aggregate price index numbers with fixed (quantity) weights satisfy 
the circular test. 


a) Show that the (i) Laspeyres and Gi) Paasche index do not satisfy the time reversal or 
factor reversal tests. 


b) Show that the Marshall-Edgeworth index satisfies the time-reversal test but not the factor- 
reversal test. (P.U., B.A./B.Sc., 1963; М.А. Econ. 1980) ‘A 


c) Prove that Fisher’s Ideal index satisfies both the time reversal and the factor reversal ter 
but does not conform to the circular test. 


ume satisfy the time 


/ 
a) Describe the methods for testing the consistency of index numbers. Explain Fisher's 
formula, giving an example. (C.S.S. 1960, P.U., B.AJB.Sc 


166 


5.15 


5.16 


5:19 


5.20 


Үеаг 
Production 


522, 
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b) How would you calculate the Consumer Price Index for factory workers in 
И (P.U., М.А. Econ 


Explam the meaning of the consumer price index number. Describe the 
construction adopted. Explain the uses of consumer price index numbers. 
(P.U., В.А./В.5с., 1962, 63, 71, 81, 85; C.S.S., 1960; М.А. Ecom 


You are required to prepare the consumer price index for industrial workers im 
Describe how you will proceed. Prepare a short questionnaire for the inquiry. 
(P.U., B.A./B.sc- 


3) Explain the construction of the index of retail prices. 


b) Consumer price index for Rawalpindi stood at 137 in July and 140 in August. 
same months, the index number at Abbottabad stood at 150 and 151. Does this 
Abbottabad is costlier than Rawalpindi? Give your reasons in detail. 

7 (P.U., В.А./В 


а) Discuss the problems which arise іп constructing consumer price index numbers. 


b) Show that the Fisher's Ideal index satisfies both the ti versal and factor- 
Discuss the other properties of this index number. xS (P.U., B.AJB. 


Discuss the nature and method of construction a» index number of wages. E: 


possibilities of constructing such an index ni 
(CSS. 1961, 65; P.U., M2 


Find the price relatives for each year frag ic following average retail prices of 
(1) 1995 as a base (ii) 1998 as a base, Aye 


The following table giv 
1954 to 1963: 


1954 4% 1956 1957 1958 1959 1960 1961 1962 1963 
728 во 438 470 511 555 564 630 662 681 


Find ош index numbers by taking (i) 1954 as base year, (ii) average of 1958, 59, 60 
period. (P.U., M.A. 


The following table gives the average wholesale prices in rupees per unit of gold, 
cotton ma the years 2001—2006. 


Average price in Rs. Per unit 


ӨӨ Н OEA 


"M Кебир of cotton cloth (in million me) for Ра 


с 


25% ч 30.8} 334 А 35.3 | 36.0 
à 14.5 ' 17.1 11.6 


Using 2001 as the base period, compute the simple aggregative price indices and 
average of relatives price indices for the years 2002 to 2006. 
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The Prices of four commodities quoted at Multan for May 2001, April 2002 and May 2002 are 
given ES 


Compute price index numbers for April 2002 and May 2002 with May 2001 as base, using (i) 
simple aggregative method, (ii) simple average (mean) of price relatives and (iii) geometric 
mean of price relatives. 

Describe the chain base method used for the construction of index numbers from the following 
table and есиме such index шдеп. Discuss its merits against the fixed base method. 


Average Prices in Rs. per 40 kg 


(P.U., М.А. Econ., 1975) 


2) Compute the link relatives, ie. the price relatives in each year with reference to the 
previous year as 100, and calculate the index for three commodities for each year. 


b) Chain the above indices to a 2001 base. 


527 


5.28 


5.29 


+ 5.30 


b) Using 1998 as the base period and 2005 quantities as weights, compute the 
2 aggregative price index and the weighted average of relatives price index for 2 
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The following table gives the price relatives of four commodities for the years 200 

inclusive, the price of each commodity in 2001 being stated as 100. 

“Commodity инст сте шг шаш 
Commodity | 2001 | 2002 | 2003 | 2004 | 2005 | 


A 100 125 125 131 
B 100 120 120 127 
с 100 87 108 122 
р 100 75 150 140 


Calculate: 
i) Anindex number for each year, 2002-2005, using the simple A.M. of price relatives. 
ii) Index number for each year, using the chain base method. 


iii) Explain why, in general, the indices of (i) and the chained indices are not im 
agreement, , 
А firm divides its material into four main groups and tries to estimate the overall 


price changes by producing an index number weighted according to thé tonnages 
2003. Calculate this index for 2007 from the ia agn and comment 


result. 
2003 < 2007 
төм Әне 


Price (Rs.) Quanti 
2005 1998 


Б 


2005 


а 5 3.15 71 80. 
2.00 1.80 107 138 
2.60 1.75 62 57 


a) Using 1997 the base period and the base period quantities as weights, 
weighted-aggregative price index and the weighted average of relatives price ї 
2005. " 


100 
56 


Comp Ие weighted-aggregative price index for 2007 with 1997 = 100 Бу) 
ге!“ 15 (ii) Paasche's method. 
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Compute the following index numbers for the data in question 5.28 by Paasche's method: 

i) Weighted aggregative price index. 

ii) Weighted average of relatives price index. 

Construct the following "eei numbers of pu for 2004 and 2005 from the given data. 


ане. 
Commodi EEEE 


Compute the index numbers of Marshall ri and Fisher's “Ideal” type for the 
following data. 


(P.U. B.AJB.Sc., 1973) | 


Given the following, construct — index number for 
i) 1964, king i ML 


(P-U., М:А.., Econ.. 1968) 
construct pric ee PE] err foodgrain data, 


е voL see 


ipe 935 | 812 | 878 | 3,974 | 3,862 ET 930 
11.25 | 11.73 | 12.08 973 
7.00 | 7.68 | 823 e. 
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5.36 


5.39 


and for 1962, using the for wheat, 8 for barley and 5 for maize. 
(B.ZU., M.A., 
-Compute Fisher's pri«essti аа б 
Quantity (units) Value (Rs.) 


Calculate (i) Laspeyres' index, (ii) Paasche’s index, (iii) Fisher's Ideal index, (iv) Mar 
geworth index, (v) Walsh index and (vi) Palg ve index for the folloy ing dairy produc 


[Prices | Quantities Produced | 
Dairy Products Fiere [тө [от [Tose [ тө» | 2007 | 


Milk cS 3.89 | 4.13 | 9,675 | 9,717 | 10,436 
Butter 61.50 62.20 | 59.70 on E. E: 
Cheese 34.80 35.40 | 38.90 


a) Obtain (i) a volume index and (ii) an index of average values for each year 
flowing сца 


Taking 1960 as base, терме. index number for the three cereal together, 


1976 . 1976 


P.U, B. 


index number could be comp is question. 
s porro [жау [вә 1 зөт. 


p ji e sut 
60 72 42 360 
30 33 297 
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Using 1955 as the base period and base period quantities as the weights, compute the weighted 
aggregative price index and the weighted average of relative price index for 1965. 
(P.U., B.A/B.Sc. 1988) 


Compute a quantity index number for (i) 1943 on 1938 as base, using 1938 valves as weights; 
and another for (ii) 1938 on 1943 as base, using 1943 values as weights. 


. . Export of Cotton Yarns 
and Mere 


——Á——: 


(P.U., B.A., (Hons.), 1961) 


my te «Желе меен Pr 
following information. 


есте [S| ST 


Prices (190 Sae ETETE 
Prices (1920) £65 £23 
What changes in cost of living figures of 1929 as compared with that of 1928 are seen? 
(P.U., В.А /В.5с. 1974, 1981) 


€ де consumer een ы рд hoe for 1940 on the basis of 1939 from the following 


(P.U., B.AB.Sc. 1973) 
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INTRODUCTION TO STATISTICAL 
Compute the consumer price index number for the following data for 2007 with 2000 


a) Define an index number, Di: 
price index numbers, 
b) Given: 


sco e main steps involved in the construction of 


30 33 450 


Compute the following: 
i) Fisher's quantity index number for 2006. 
ii) Simple aggregative value index for 2006. | (Р.О. B.AJB 
Obtain i) A simple aggregative value index 
ii) An index of average values, for each year from the following data: 
Retained Imports (Billions Rs. 


Declared value | Value on the basis of 2002 values 
860 860 


950 832 
807 
704 


(P.U.B 


*$*99999999 
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#1 INTRODUCTION 


The word probability has two basic meanings: (i) a quantitative measure of uncertainty and (ii) a 
measure of degree of belief in a pa in a particular statement or problem. 


Probability and statistics are fundamentally interrelated. Probability is often called the vehicle of 
ics. The area- of inferential statistics in which we are mainly concerned with drawing inferences 
experiments or situations involving an element of uncertainty, leans heavily upon probability theory. 

is also an inherent part of statistical inference as inferences are based on a sample, and a 
being a small part of the larger population, contains incomplete information. A similar type of 
occurs when we toss a coin, draw a card or throw dice, etc. The uncertainty in all these cases 
in terms of probability. . 


It is always clear what we mean when we make statements of the type that it is very likely to rain 
or I have a fair chance of passing the annual examination or A will probably win a prize, etc. In 
of these statements, the natural state of uncertainty is expressed, but on the basis of past evidence, 
some degree of personal belief in the truth of each statement. 


The foundations of probability were laid by two French, Suithematicians of the seventeenth 
-Blaise Pascal (1623-1662) and Pierre De Fermat age 665)-in connection with gambling 

. Later on it was developed by Jakob Bernoulli (165458705), Abraham De Moivre (1667-1754) 
Serre Simon Laplace (1749-1827). The modem t of probability theory which consists of 
2 few axioms and rules resulting from ‚ was developed during the twenties and 
of twentieth century. 


Today the probability theory has a s of application and is used to make intelligent 
in Economics, Management, tions Research, Sociology, Psychology, Astronomy, 
Engineering and Genetics MN and uncertainty are involved. 


is best understood through the application of the modern set 
ДУ) toe mel ной t etn of de hod n 


get is any well-defined collection or list of distinct objects, e.g. a group of students, the books in 
“же integers between 1 and 100, all human beings on the Earth, etc. The term well-defined here 
‘Get any object must be classified as either belonging or not belonging to the set under 
ээп, and the term distinct implies that each object must appear only once. The objects that are in 
zalled members or elements of that set. Sets are usually denoted by capital letters such as A, B, 
while their elements are represented by small letters such as a, Б, с, y, etc. Elements are 
braces to represent a set, e.g. 
A= (a, b, x, y) or B= (1,2,3, 7) 
ж ап element of a set, A, we write x Є A, which is read as "x belongs to А” or x is in A. 


Ses not belong to A, i.e. x is not an element of A, we write x ¢ A. The number of a set А. 
4), is defined as the number of elements in A. 
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A set that has no elements is called an empty or a null set and is denoted by the symbol 2. 
be noted that (0) is not an empty set as it contains an element 0. If a set contains only one 
called a unit set or a singleton set. It is also important to note the difference between an element ^ 
a unit set (x). The elements of a set may be sets themselves. 


A set may be specified іп two ways. We may either give a list of all the elements of a 
"Roster" method), e.g. 


А = {1,3, 5, 7,9, 11}; B= {а book, a city, a clock, a teacher}; 
or we may state a rule that enables us to determine whether or not a given object is a member 
(the "rule" method or "set builder" method), e.g. 

A= ( xx is an odd number and x< 12} 
meaning that А is a set of all elements x such that x is an odd number and x із less than 
vertical line is read as “such that", The repetition or the order in which the elements of a set 
not change the nature of the set. The size of a set is given by the number of elements present m 


number may be finite or infinite. Thus a set is finite when it co s a finite number of 
otherwise it is an infinite ser. The empty set is regarded as a finite Ke xamples of finite sets are 


i) А-41,2,3,...,99,100); ev 
ii) B={x|x шіліктің offe уйй, ҚМ ` 
i) C= (x|x isa printing mistake in a S - 
iv) D={x|x Beg pester с etc. 
and the sets D 


К 1) A={x|x is an even iniga? 


ii) В= {x|x is a real mgfber between 0 and 1 inclusive), że., B= {x|0< x <1} 
iii) С={х|хв a line}; 
iv) D={x|x isa sentence in the English language}; etc. are the examples of an infinite 
A set А is said to be in one-to-one correspondence with a set B when every element of set 
made to correspond to one and only'óne element of set В and conversely. For example, if 
— A7 (1,2,3,4) and B 7 (a, b, c, d] 
then the sets А and В are їп one-to-one correspondence. 


А set is called countably infinite or denumerable when its elements can be put into a 
- correspondence with the sequence of positive integers. A set is said to be non-denumerable 
, elements cannot be enumerated. 


6.2.1 Subsets. А set that consists of some elements or members of another set, is called a 
that set. For example, if В is a subset of A, then every member of set B is also a member of set 4 
write 
Вс А or equivalently A> В 
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which is read as “В is a subset-of А” or is contained in A, or A contains B. For example. if 
A= (1,2, 3, 4,5, 10} and В = {1,3,5} 
Жеп Bc A, іе. B is contained in A. 


It should be noted that a set А is always regarded a subset of itself and an empty set ф is considered 
be (or accepted as) a subset of every set. Two sets А and B are equal or identical, if and only if they 
exactly the same elements. That is A = B if and only if 4C B and Bc A. If a set B contains some 

not all of the elements of another set A while.4 contains each element of В, i.e. if 

BcA and B«4 
the set B is defined to be a proper subset of A. The large or the original.set of which all the sets we 
about, are subsets is called the universal set or the space and is generally denoted by 5 or Q. The 
1 set thus contains all possible elements under consideration. It is also regarded a subset of itself. 
$ with n element will produce 2" subsets, including 5 and ф. 


6.2.2 Venn Diagram. A diagram that is understood to represent sets by circular regions, parts of 
regions or their complements with respect to a rectarigle ing the space S is called a Venn 


named after the English logician John Venn (1834-1923). enn diagrams are used to 
t sets and subsets in a pictorial way and to verify the relatiogship among sefs and subsets. An 
ofa Venn diagram follows: ом 


A Simple Venn RA 


423 Operations on Sets. Let the and В be the subsets of some universal set S. Then these 
be combined and operated ws jous ways to form new sets which are also subsets of 5. The 
ions are union, TC. rence and complementation. 


9 Тһе union or sum ts A and B, denoted by AU B, and read as “A union В or A cup : 
Pre uen у elemenss that belong to at least one of the sets A and B, that is 


AUB- {х|хє А or xe B) 
y means of a Venn diagram, A U B is shown by the shaded area as below: 


A U B is shaded 4 


The intersection of two sets А and B, denoted by AM B or by AB, and read as “А intersection 
B" or "A cap В”, means the sets of all elements that belong to both А and B; that is 
An B- (x|xe А and хе B]. 


https://stat9943.blogspot.com 


176 INTRODUCTION TO STATISTICAL 


Diagrammatically А ^B is shown by the shaded area as below: 
A B 


AAN B is shaded 


: The operations of union and intersection have been defined for two sets only. They 
conveniently be extended to any finite number of sets. 


Two sets A and B are defined to be disjoint or mutually exclusive or non-overlapping when 
have no elements in common, i.e. when their intersection is an empty set or AAB = ¢. On Ве; 


hand, two sets 4 and В are said to be conjoint when they have at least one element in common. 


iii) Тһе difference of two sets A and B, denoted by А - B or by A — (AA B), is the set 
elements of А which do not belong to В. Symbolically, < à 
[м 


A-B- (x|x € 4 and хёВ} 
It is to be pointed out that in general A — B+ B— А, bee 8-4-(х|хе B and x eA} 
shaded area of the following Venn diagram shows the diffaapuce A ~ В. 


ГУ n А — B is shaded 
ив to be noted that 4 — 8 aA) —M 
coincides with the set A. 
іу) Complementatit(N. particular difference 5 — A, that is, the set of all those el 
which do not belong to A, is called the Complement of A and is denoted by A or by 4°. 
In symbols A= (x|x e 5 and x¢A} 
The complement of 5 is the empty set 9. The complement of А is shown by the shaded 
the following Venn diagram. 


It should be noted that A — B and AB, where В is the complement of set B, are the 
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6.2.4 Тһе Algebra of Sets. The algebra of sets provides us with laws which can be used to solve 
many problems in probability calculations. Let A, В and C be any subsets of the universal set $. Then we 


1) Commutative laws 
АХВ = ВХА and ANB=BO A 
it) Associative laws 
(AUB) UC=AU(BUO and (ANB) NC=AN(BNO) 
ui Distributive laws 
.  AnYBwu C)*(An B)U (ANC) and 4U (Br C)»(4VU B) (4v С) 
m) Ійетрогелі laws 


AUA=A and ACA - 4 
v) Identity laws 
А95-8,4015<4,49-А, and Arg 7. RS 
" 


=) Complementation laws 
AVA=S, ANA - (4) - A, and S - 4d - S. 


Ў 


=) De Morgan's laws $5 
(428-208, and (Аг\В)= 192 

җән аты Jenn diagrams. 

set S is a sub-division of the set into non-empty subsets 
disjoint and exhaustive, i.e. NUN жатынынан 
9 404, =% where ге ANS 
3) AU AU. A, m 502” 
The subsets in a paryagt are called cells. Let us consider a set 5 = (a, b, c, d, e). Then (a, b) and 
ж ish partition of S as each element of S belongs to exactly one cell. 
%.2.6 Class of Sets. A set of sets is called a class, e.g. іп a set of lines, each line is а set of points, 
of all subset of a set A is called the power set of A and is denoted by (P(A). For example, if 
П. then P(A) = (9, {Н}, {T}, {Н, 7)). 


7 Cartesian Product Sets. The Cartesian product of sets A and B, denoted by А x B, (read as 
87), is a set that contains all ordered pairs-(x, у), where x belongs to A and y belongs to В. 


AxB- (x y)|x e А and y eB}. 


= also called the Cartesian set of А and B, tame. after the. French auibematician Rens’ 
(1596-1650). The product of a set A by itself is denoted by А2. This concept of product may be 
3 any finite number of sets. 


Let A = (H, T} and B= (1, 2, 3, 4, 5, 6). Then the Cartesian product set is the collection 
following twelve (2 x 6) ordered pairs: 
z A XB = {(H, 1); (H. 2); (H. 3); (Н, 4); (Н; 5); (Н, б); 
(7, 1); (7,2); (7, 3); (Т, 4); (Т, 5); (Т, б)) 


Clearly, these twelve elements together make up the universal set 5 when a coin and a 
tossed together. A die (plural, dice) is a cube of wood or ivory whose six faces are marked with | 


shown below: 


Bacon НН д 


It is relevant to note that, in general, 4 x B? Bx A. 
The product A x В may conveniently be found by means of the so-called tree багар 


below: A B AxB 
4 | 
ч 1 (H,1) 
2 (H,2) | 
3 (HQ) | 
4 ) 
A Qm 
(H,6) 
(11) 
(12) 
3 (3) 
4 (1,4) 
5 (1,5) 
(1,6) 

The "tree" is constructed eafb paa аты 
enumerating all the possible bes two or more sequential events; the possible outcomes 
ашыш rei eet no of the tree. This diagram is also used when we 
multiplication rules to compute gfebabilities. 

6.2.8 Relation tion. A relation from a set A to a set B is a subset of the 


product of A х В. Such a relation is usually called a binary relation. That is, a relation is an 
between two or more objects. The set of the first elements of a binary relation is called the domain 
relation, while the set of the second elements of the relation is called the range. For example, if a 
is F = {(1,4), (2,7), (3,12)}, then its domain is D = (1, 2, 3) and its range is R = (4, 7, 12). 


A function is a rule that assigns values in some manner from a set / to a set B such that 
element x of A, there is a unique element y of B, Such an assignment is usually written as f : A —> 
is a function from А to B. In other words, a function is a special kind of binary relation that 
each element of the domain to a unique element of the range. It is to be emphasized that a 
binary relation in which no first element is repeated. The value of the function f at x€ is 

y -f(x)e B. 


The variable x that represents the elements of the domain, is called the independent vari 
the variable y (— f(x)) representing elements of the range is referred to as the dependent variable. 
functions are also called single-valued functions. A function, whose range consists of numbers, is 
numerical function. A function whose domain and range consist of sets of real numbers, is said 1 
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valued function of a real variable. A function f(x) is defined to be an even function, if for every x in à 
en range Д-х) = f(x), e.g. A) = =x". A function f(x) having the property that f(-x) = —/{х), is said to be 
odd function, eg, fix) 2x". 


RANDOM EXPERIMENT 


The term experiment means a planned activity or process whose results yield a set of data. A single 
Sermance of an experiment is called a trial. The result obtained from an experiment or a trial is called 
соте. 


Ап experiment which produces different results even though it is repeated а large number of times 
= essentially similar conditions, is called a random experiment. The tossing of a fair coin, the 
g of a balanced die, drawing of a card from a well-shufiled deck of 52 playing cards, selecting a 
sie, etc. are examples of random experiments. A random experiment has three properties: 

The experiment can be repeated, practically or theoretically, any number of times, 

Тһе experiment always has two or more possible outcomes. An experiment that has опе” 
possible outcome, is not a random experiment. 

The outcome of each repetition is unpredictable, i.e. it has “on of uncertainty. 


ft is to be remembered that an ordinary deck of playing cards с 52 cards arranged in 4 suits 
S ezch. The four suits are called diamonds, hearts, clubs and ; the first two are red and the last 
black. The face values called denominations, of the 13 in each suit are ace, 2, 3, ... , 10, 
=en and king. The term honour card refers to the inations ace, 10, jack, queen and king. 
See cards are king, queen and jack. These cards аге or various games such as whist, bridge, 
etc. 


%51 Sample Space. А set consisting of ssible outcomes that can result from a random 
sent (real or conceptual), is defined to oy iple space for the experiment and is denoted by the 
Е Each possible outcome is a mei е sample space and is called a sample point in that 
от instance, the experiment of tossing? coin results in either of the two possible outcomes: a head 
a tail (7); landing on its ed e lling away is not considered. The sample space for this 
ent may be expressed in set pats as S = (H, T). The sample space for tossing two coins once 
== a coin twice) will contaüf*four possible outcomes denoted by S = (HH, HT, TH, TT}. Clearly, 
Cartesian product А here А = (H, Т}. Similarly, the sample space S for the random 
eat of throwing two%@ix-sided dice can be described by the Cartesian produce 4 x A, where 
Е 3,4, 5, 6}. In other words, 


S=AxA={(x,y) |х e4 and yeA} 


denotes the number of dots on the upper face of the first die and y, the number of dots on the 
of the second die; and 8 contains 36 outcomes or sample points, which may also be 
йу represented in the following manner: 


5= {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6); 

< (2. 1), (2,2), (2, 3), (2, 4), (2, 5), (2, 6); 
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, б); 

(4, 1); (4, 2), (4, 3), (4, 4), (4, 5), (4, 6); 

(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, б); 

(6, 1), (6, 2), (6. 3), (6, 4), (6, 5), (6, 6)) 
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This sample space may more briefly be expressed as 
S= (0.0 | i= 1,2, 3,4,5,6; j71,2,3.4,5.6] 


A sample space that contains a finite number of sample points is said to be a finite sample space. 
is defined to be a discrete sample space if the sample points can be placed in a опе-іс 
correspondence with the positive integers or if it is a finite. If it satisfies neither of these criteria, 
called continuous. It should be noted that our sample space S will be finite unless otherwise stated. 

6.3.1 Events, An event is an individual outcome ог any number of outcomes (sample points) 
random experiment ог a trial. In set terminology, any subset of a sample space 5 of the experiment 
called an event. An event that contains exactly one sample point, is defined a simple event. A comp 
event contains more than one sample point and is produced by the union of simple events. For ins 
the occurrence of a 6 when a die is thrown, is a simple event, while the occurrence of a sum of 10 
Less Uo cena eng Na ion e ^na а apte ae sia 6), (5, 
(6, 4). 

An event А is said to occur if and only if the outcome of the experiment corresponds ta 
element of A. The event “not-A” is denoted by A or Aand is the negation (or comple: 
event) of A. For example, the complement of "heads" is "tails" for of one coin; the compleme 
“at least one head" on 4 tosses of a coin is “по heads”. A sample space consisting of n sample points 
produce 2" ге sample point can either be inc 
or excluded in forming а subset. 

To illustrate, let us consider a sample теледі three sample point, Le. 5 = (a. b, c]. 


Then the eight possible subsets are SS 
9, (а), {b}, {с}, (а, Б}, {а, с}, ра, b, с} 


Each of these subsets is an event. The ви ta, b, cj is the sample space itself and is also an 
always occurs and is known as the сея! or sure event. The empty set ф is also an event, i 
known as impossible event, because, Hever occur. 

This class of 2? = 8 (events) can be thought of as a field which is denoted by F. 
events have the following d 


i) The union of an) аби af eyed will suit a a pet thai belongs to. 

ii) Тһе intersection of any number of events will result in a set that belongs to F. 
iii) ^ The difference of any two events belongs to F. 

iv) - Thecomplement of any event belongs to “7. 


Mutually Exclusive Events. Two events 4 and B of a single experiment are said to be 
exclusive or disjoint if and only if they cannot both occur at the same time. That is they have no p 
common. For instance, when we toss a coin, we get either a head or a tail, but not both, the two 
head and tail are therefore mutually exclusive; when a die is rolled, the outcomes are mutually ехе 
as we get one and only one of six possible outcomes 1, 2, 3, 4, 5 or 6. Similarly, a student either а 
or fails, a single birth must be either a boy or a girl, it cannot be both, étc. Three or more 
originating from the same experiment are mutually exclusive if pairwise they are mutually exclus 
the two events can occur at the same time, they are not mutually exclusive, e.g., if we draw a card @ 
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deck of 52 playing cards, it can be both a king and a diamond, Therefore kings and diamonds are 
mutually exclusive, Similarly, inflation and recession are not mutually exclusive events, 


ustive Events. Events are said to be collecnvely exhaustive, when the umon of mutualiy exclusive 
5 is the entire sample space S. Thus, in our coin-tossing experiment, head and tail are collectively 
ustive set of events. A group of mutually exclusive and exhaustive events is called a partition of the 
бе space. For instance, events 4 and 4° form a partition as they are mutually exclusive and their 
is the entire sample space. 


Likely Events. Two events 4 and В are said to be equally likely, when one event is as likely to 
as the other. In other words, each event should occur in equal number in repeated trials. For 
pie, when a fair coin is tossed, the head is as likely to appear as the tail, and the proportion of times 
de is expected to appear is 2. 


$3.3 Events and Symbolic Representations. For convenience, the verbal statements of some 
and their corresponding symbolic representations in sets are listed below 


Verbal statement Set Notation 


Event 4 Acs S 


Event A is impossible А=@ қ” 
vent А is sure (certain) A WS 
1: (Event А does not occur) ES - 4 
atA orevent A Savas 
Sent A or event В % Au B 
zt А and event В o AnB 
«=з А and В are mutually exclusiva ANB=¢ 
exs 4 and В are exhaustive М AUB=S 


B occurs when A occupy AcB 
A occurs but B 42000 occur Ас\В 
B occurs given that А has occurred B| 4, 


Counting Sample Points. When the number of sample points in a sample space 5 is very 
"omes very inconvenient and difficult to list them all and to count the number of points іп the 
= 5 and in the subsets of 5. We then need some methods or rules which help us to:count the 
ай sample points without actually listing them. A few of the basic rules frequently used in 
briefly described here. 


Bale of Multiplication. If a compound. experimeni consists of two experiments such that the 
ent has exactly m distinct outcomes and, if corresponding to each outcome of the first 
can be n distinct outcomes of the second experiment, then the compound experiment has 


ze, the compound experiment of tossing a coin and throwing а dic together consists of 
- the coin-tossing with two distinct outcomes (H, 7), and the dic-throwing with six 
=з (1, 2, 3, 4, 5, 6). The total number of possible distinct outcomes of the compound 
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experiment is therefore 2 x 6 - 12, as each of the two outcomes of the coin-tossing experiment 
with each of the six outcomes of die-throwing experiment (see Cartesian product on page 178) 
diagram (page 178) provides a good illustration of this rule, 


The rule of multiplication can be readily extended to compound experiments consisti 
number of experiments performed in a given sequence. 


(b) Rule of Permutation. А permutation is any ordered subset from a set of n distinct 
The number of permutations of r objects, selected in a definite order from n distinct objects is 


"ће symbol "P. 
To derive the computational formula for "Р. (ғ < n), we proceed as below: 


The first object may be chosen in л ways, and corresponding to each way of first se 
second object may be chosen in (n — 1) ways. Similarly, once selections have been made for both 
second objects, there are (n — 2) objects left and the third object may be selected in (n - 2) 
EN dE a г + 1) ways. Thus the first ғ objects may be chosen i 
(n -2) ... (n— r + 1) ways. 


Hence у 
in particular, when.» ^, sn (8-1) (1-2)... x1 
=n! (read n fac 


It is relevant to note that 1! = | and that 3 0!» 1. 


! 

The expression for "P, on multipl eco D ------, may be written аз " P, — 
x _г)! (л-г)! 

The number of permutati oo selected all at a time, when n objects consist of 


kind, n; of a second kind, . ET kth kind, (Ел, =л) в P= 
TUN 


хы 
(c) Rule of Со tion. А combination is any subset of r objects, selected without 
their order, from a set of л distinct objects. The total number of such combinations is й 


symbol ^C, or (9 (read “n above r”), where r € n. 
The computational formula for "C, is derived as below: 


Let there be $ | possible combinations of ғ objects, selected from n distinct objects 
vembination of r objects, if order is not disregarded, can be arranged in r! different orders 
makes r! different permutations, The number of permutations or ordered subsets that c 
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inations, each having r objects, is therefore r! ( |. But the total numiber of permutatións ofr 
rj 


, selected from n objects is "P, Thus we conclude that 


n| “В. n(n-0(n-2)..(n—-7r +1) 
r! r(r -1)(r - 2)...3x2x1) 


t n! 
rn r) 


ity C) or "С, is also called a binomial co-efficient because ие in the binomial 
r 


of (a « b)" = i Ja т" b", Тһе binomial co-efficient s important properties. 
ғ 


E 
М, п › and @ |" f "| (n+ UN 
т) in-r 1 ) 09° 
factorials may be conveniently eaea by using an approximation known as Stirling's 
X, 


N ne" 2л, 
= 3.1416 and en 2.718 


le 6.1 A club consists of four members. How many sample points аге in the sample space 

officers: president, secretary and treasurer, are to be chosen? 

evident that the order in which 3 officers are to be chosen, is of significance. Thus there are 4 

the first office, 3 choices for the second office and 2 choices for the third office. Hence the 
le points is 4 x 3 x 2 = 24. In other words, the number of permutations is 


р =—" =4x3x2=24. 
(4-3)! 


four members be A, B, C and D. Then a tree diagram which provides an organized way of 
arrangements, for this example, is given on the next page: 
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* President Secretary Treasurer Sample Space 


Example 6.2 A three-person committee is to be M om a list of four persons. 


sample points are associated with the experiment” SS 
Since the order in which the three persons o Committee are chosen, is uni 
therefore an example of a problem involving combi . Thus the desired number of combi 
4 [ n) 4% 


Сы Ж acm a 
In other words, the sample space „Ө with the experiment contains 4 sample points. 
These two examples serve N justrate the difference between a permutation and а comb 


Example 6.3 How le points are in the sample space when a person draws à 
cards from a well-shuffl nary deck of 52 cards? 


The total number of sample points is given by 
(*) |%) 52х51х50х49х48 


| T = 2,508,960 
ir] X5J 5х4х3х2х1 


6.4 DEFINITIONS OF PROBABILITY 


Probability can be discussed from two points of view: the objective and the 
Objective probability can be classified into the following categories. cach of which is briefly 
follows: 


(a) The Classical or 4 Prior: Definition of Probability 15 given as follows: 


If a random experiment can produce л mutually exclusive and equally likely outcomes and 
these outcomes are considered favourable to the occurrence of a certain event А, then the 
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event А, denoted Бу Р(4), is defined as the ratio £s. Symbolically, we write 
n 


PS m __Numberof favourable outcomes 


m  Totalnumber of possible outcomes 


This definition was formulated by the French mathematician P.S, Laplace (1749-1827) and can be 
conveniently used in experiments where the total number of possible outcomes and the number of 
omes favourable to an event can be determined. 


The classical definition has the following shortcomings: 


О Ті definition is said to involve circular reasoning as the term equally likely really means 
equally probable. Thus probability is defined by introducing concepts that presume a prior 
knowledge of the meaning of probability. 

г) This definition is not applicable when the assumption of equally likely does not hold. 


=) This definition becomes vague when the number of possible S may be infinite. 


This definition assumes that as п icis indefinitely, the ratio ™ tends to become stable at the 
n 


value P(A). To investigate us of the long-run stability, several coin-fossing experiments 
performed. The results o a well known experiments are shown below; 
= - 


Number of times 
com tossed (л) 


Buffon 


K. Pearson | 


К. Pearson 


Ж ss quite obvious that the value of the ratio Z. fluctuates about the number 0.5 and becomes 
n 


95 as the number of throws increases. This sort of a long-run frequency property provides а 
of the theory of probability. 


Тыз definition is also called the statistical or empirical definition of probability as it is based on 
1 data. It is more useful for practical problems. ' 
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This definition too has certain limitations as the conditions under*which an 
performed, may change from trial to trial and it is not possible, in practice, to repeat the e: 


infinite number of times and hence the ratio = may not be unique. 


„© Тһе Axiomatic Definition of Probability. This definition, introduced іп 1933 by the 
mathematician Andrei М. Kolmogorov (1903-1987), is based on a set of axioms, where an 
statement that is assumed to be true|Let 5 be a sample space with the sample points Е), En 
To each sample point, we assign a real number, denoted by the symbol Р(Е)), and called the 
Е, that must satisfy the following basic axioms: 


X Axiom (i), For any event E, 0< P(E) <1 
//Axiom (ii). 2(5) = 1 for the sure event $ 
AA Axiom (iii). IfA and B are mutually exclusive events (subsets), 
then P(A В)- P(A) + P(B). 


It is to be emphasized that the axiomatic theory of probability assumes that some 
defined as a non-negative real number is to be attached to each sample point E, such that the 
such numbers must equal one. The assignment of probabilities Sgen on past evidence or 
other underlying conditions. 


Эу 
Probability of an event. If an event А is defined іп ample space S, then its probabi 
equal to the sum of the probabilities of all sample points аге included in A, i.e. P(A) = X, Р( 


all the n possible outcomes si а random Же equally likely to occur, then each 


assigned the same probability = ‚ eg. in throwin; die once, P) e c PQQ) = css P) 


bs 


It follows from axiom (iii) that for апу A containing m equally likely outcomes (sample 
have ч 


қ» 
PUA) ve Number of sample points in A 200 
en  NumberofsamplepointsinS п(5) 


It is interesting e in the classical definition of probability, 


n 


i) P(A)=— 20; 
ii) PA(S)=2=1: 
n 


iii) If A and В are mutually exclusive events, and mi, and т; outcomes are favourable to 
then 


РАУ В) = Есе, Р(А) + P(B). 


Thus the classicai definition of probability also satisfies Kolmogorov's axioms. It isy 
conclude that the probability is always a number between zero and one (inclusive). 
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6.41 Subjective or Personalistic Probability. (as i its name suggests, the subjective or 
istic probability is a measure of the strength of a petson’ s belief regarding the occurrence of an 
A )Probability in this sense is purely subjective and is based on whatéver evidence is available to 
individual. This definition being flexible, may be applied to those real-world situations where neither 
equally likely nor a relative frequency approach is possible. The subjective probability has a 
vantage that two or more persons faced with the same evidence may arrive at different probabilities. 


Example 6.4 If a card is drawn from an ordinary deck of 52 playing cards, find the probability that 
“Же card is a red card, (ii) the card is a diamond, (iii) the card is а 10. 


BABILITY 


The total number of possible outcomes is 52, and we assume that all possible outcomes are equally 


Let A represent the event that the card drawn is а red card. then the number of outcomes 
favourable to the event А is 26 since there аге 25 red cards. 


н аст Number of *ourable outcomes 
ent» Me ———————— 


n  Totalnumber of possible outcomes 


7372 Кы 


Let B denote the event that the card drawn is a dang Then the number of outcomes 
favourable to the event В is 13 since there are 13 di 


18- 1 
Hence Р(В)------. 
қ 52 4 КС 
Let C denote the event that the card sei 10. Then the number of eres favourable to 
Cis4asthereare four 10%, ` Су 


"desi 
Thus P(C)=—=—. 
re > 


“ 
Essmple 6.5 A fair com is N three fimes. What is the probability that at least one head 


sample space for thigg periment is 

5= {HHH, HHT, HTH, THH, HTT, THT, ТІН, ТТТ) 

ж5)-8. 

Же coin is fair, we assume that each of these outcomes is equally likely to appear. Therefore we 


probability of i to each outcome. 


4 denote the event that at least one head appears. Then 
A= (HHH, HHT, HTH, THH, HTT, THT, TTH} 
n(A)=7. 


nO 7 


Bae уе 
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Example 6.6 If two fair dice are thrown. what is the probability of getting (i) a double si 
sum of 8 or more dots? 


The sample space S is represented by the following 36 outcomes: 


S= {(1, 1). (1, 2), (1,3), (1, 4), (1,5), (1, 6) 
(2. 1). (2, 2), (2. 3), (2, 4), (2. 5), (2. 6) 
(3, 1), (3, 2), (3, 3), 3, 4), (3, 5), (3, 6) 
(4, 1), (4, 2), (4, 3), (4. 4), (4, 5), (4, 6) 
(5, 1), (5, 2), (5, 3), (5. 4), (5, 5), (5, 6) 
(6, 1). (6, 2), (6, 3), (6, 4), (6, 5), (6, ©} 

As the dice are fair, therefore each of these 36 outcomes is equally likely and a probability 
is attached with each outcome. 


% 
i) — "Let A represent the event that a double six occurs. 
Then А = {(6, 6)} and thus P(A) us 


ti) Let В denote the event that a sum of 8 or more don pote 


Then В = {(6, 2), (5, 3), (4, 4), (3, 5), (2, 6), (6, +4), (4, 5), 
* (3, 6), (6,4), (5.5), (4, б), (6. Q9 (5. 6), (6. 6)}. 
е. п(В)- 15. 
5% 
Непсе Р(В) = 15-5. ху 
36 12 Dos 
Example 6.7 Six white balls black balls, which are indistinguishable apart 


are placed in a bag. If six balls are tal m the bag, find the probability of their being three 
three black. 


— — oS 
S (9). B 

v (6) 60106)! 

Let А represent the event that three white and three black balis are taken. Then the 


6) (4 
outcomes that correspond to the event А is BEN =80. 


Therefore Р(А)------------ 


Example 6.8 An employer wishes to hire three people from a group of 15 applicants, 8 
women, all of whom are equally qualified to fill the position. If he selects the three at random. 
probability that (i) all three will be men, (ii) at least one will be a woman? (P.U., M.A. 


a Let A represent the event that the three selecid will be men. Then 4 contains (5 |= 


56 sample 


points, the number of ways in which 3 men сап be selected from 8 men: 


”"4) 56.8 
"(S) 455 65 


Therefore P(A) = 


a) At least one woman means one, two or three women. Let В denote the event that at least one 
woman is selected. 


\ 


me sin АЛТ 


=196+168+35 =399 sample points. < 
A м 
Hence Р(В) = 102) 399.57 окуу, Кы 
n(S) 455 65 


S 


Example 6.9 Four items are taken at random from of 12 items and inspected. The box 15 
if more than 1 item is found to be faulty: I are 3 faulty items in the box, find the 
ity that the box is accepted. 


ETE bs 
The sample space S contains - dn un points. 


The box contains 3 faulty and 9 o ms. The box is accepted if there is (i) no faulty item, or (it) 
item in the sample of 4 € 2 


Let A denote the event the xe of faulty items chosen is 0 or 1. Then 


(XS 


7 126 * 252 — 378 sample points. 


Фе probability that the box is accepted is 0.76. _ 


WS OF PROBABILITY 


following are some of the basic rules for the calculations of probability. These rules have 
applications. 


Theorem 6.1 If д is the impossible event, then P(¢) = 0. 
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Proof. The sure event 5 and the impossible event ¢ are mutually exclusive and their union is 
SU@=S 

Then P(S) = P(S 0$) = P(S) + PCO) 

Subtracting P(S) from both sides, we get 

P(G)=0 
Thus the probability of the impossible event is zero. It is to be kept in mind that the 
this rule is not generally true. Moreover, a theorem is a statement derived either from 
previously proved theorems, 


Theorem 6.2 Law of ren if Ais the complement of an event A 
sample space S, then 


P(A) =1- P(A). 


Proof. Since the event 4 and A are mutually exclusive — € exhaustive, 
together make up the entire sample space S, therefore, we have о 


AvA =S ev 
Thus P(AU A) = Р(Х) EM 
or P(A)+P(A)=1 [* RÈS 1 by axiom ip} 


or P(A) =1= PCA). ор» 


Hence the probability of the со t of an event is equal to one minus the pro 
event. Complementary probabilities useful when the question asks for the probability 
one". 


Example 6.10 A coin 4 times in succession. What is the probability that at least 
occurs? ҳу 


The sample space 5 for this experiment consists of 2* = 16 sample points, as each toss 
2 outcomes, and we assume that each outcome is equally likely. 


Let А represent the event that аг least one head occurs. Then А consists of many sample 
the other hand, 4 is the event that ло head occurs and A has the single sample point (7777). 


Hence by the Jaw of complementation, we have 
P(A)=1-P(A) = "LE 


Example 6.11 А coin is biased so that the probability that it falls showing tails is 2. 
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$9 Find the probability of obtaining аг least one head when the coin is tossed five times. 


®©) How many times must the coin be tossed so that the probability of obtaini.ig at least one head 
is greater than 0.98? 


Here P (a head appears) = + and 
P (no head or tail appears) =. 


Let А be the event that af least one head is obtained when the coin is tossed 5 times, and A is 
the event that no head is obtained. Then by the law of complementation, we have 


P(A) 21- P(A) 
5 
-1-(3) =1-0.237 = 0.763 


Let the coin be tossed л times to obtain the probability of at least one head greater than 0.98. 
зү зү 

Then 1-|—| 20.98 ie. |=] <0.02 
G) ы % У 


Taking logs, we have eU 
3 

n i4) å log 0.02 i 
4 49% 


Dividing both sides by i4] and reversing wh ану sign as 2) is negative, we have 
log0.2 10. -1.6990 
ТЕР: ‚8751 ~ 0.1249 


2 * 14. N 
coin should be tossed aes so that the probability of obtaining at least one head is greater 
ҳу 


213.6 


‘Theorem 6.3 Probability of Subevent. If A and B are two events such that Ас B, then 
AB). 


. For Ac B, the event B may be written as the 
two mutually exclusive events BOA and BOA, 


i:BnhA)o(Bn^A) 
424 зо B= AU(BNA) 


P(B) = P(A) + (Br А) 


^4)20. 
As P(B). BNA is shaded 
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It is to be noted that an event such as AN B is called a joint event and probability associated » 
a joint event is called a joint probability. 


Theorem 6.4 If A and B are any two events defined іп a sample space S, then 
P(An В) = P(A) - P(An B). 


Proof. The events ANB and AAB are mutually 
exclusive and their union is А (see the Venn diagram). That is 


"Asv(4n B)u(AnB), 
P(A) = Р(АсуВ)+ Р(А с\В) 


Hence Р(АгуВ) = P(4)- Р(Аг\В) 


А is shaded 
<< Theorem 6.5 Addition Law. If A and В are any two events defined in a sample space 5, the 


P(AUB)- Р(А) + P(B) - P(AnB) 


40B- AU(Br A) М 
^ e 
Then P(AU В) = P(A) + P(Br А) 45% 


Again the event В may also be into two Ж 2 
mutually exclusive events as 9 АПВ ANB BNA 


B-(AnB)u(4 AUB is shaded 

Y 

К = А 
Thus Р) PAY eB) 


Subtracting this result әле former, we get 
қо» P(4U B)- РВ) = Р(4)- P(A B) 


Hence PAOB) = P(A) + P(B) - PIAA B): 
Пив law, often called the General Rule of Addition for probabilities may be stated as b 


“If two events A and В are not mutually exclusive, then the probability that at least o 
occurs, is given by the sum of the separate probabilities of events A and В minus the probab 
joint event AANB.” 


Coroilary 1. If 4 and B are mutually exclusive events, then 
P(A U B)= P(A) + P(B) 
Proof: Since the events А and В are mutually exclusive, therefore 
An B-6$ and Р(АгуВ)= P($) =0 


Hence Р(А B) = P(A) + P(B), which is just a restatement of axiom (iii). 
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ively. Let n be the total number of sample points, 
4 and В be two events; A consisting of m, sample 
and B of m; sample points. The occurrence of A U B 
which consists of all the sample points belonging to 
4or B 


тсе А and В are mutually exclusive events, therefore they have no sample points in common. 
e>viously, the number of points contained in AU В is m, m. 


number of sample points in A U B 


Hence P(428)- 
number of sample points in $ 


ту +m m ту 
12 uL, 2 = рой) + P(B). 
n n n 


2. И 4,,4...... 4, are k mutually exclusive events, then орана that опе of them 
ж the sum of the probabilities of the separate events, i.e 


o 
P(A, U А, VU... Ay) = P(A) + PA + SA, ) 
is to be noted. that if A, Az,- A, are mutually е and collectively exhaustive, then 
PL) ++ PLA) S1 NS 
3. If A and В are any two events, then мы е 
P(AU a (A) + P(B). 

addition law for any two events LS B may be written as 

PIAU Ву PLAN В) = RS РЇ B) 

PLAU B) € PL) ғ 
for апу events AS... A,, the relation 

P(A А, VJ. A Ap) S Р(А,)+Р(А,)+...+ P(A). 

result is known as Boole % inequality. 


le 6.12 If one card is selected at random from a deck of 52 playing cards, what is the 
that the card 15 a club or a face card or both? 


A represent the event that the card selected is a club, В, the event that the card selected is a face 
4^ B, the event that the card selected is both a club and a face card. Then we need P(A U В). 


Now Р(4) M as there are 13 clubs, 


P(B) A as there are 12 face cards, 
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sid P(AMB) ==, since 3 of cubs are also face cards. 
Therefore the desitod probability is 
P(AU B) = P(A) + P(B) - P(A N B) 
.B2 3 2 
52152 52 52 


Example 6.13 Ап integer is chosen at random from the first 200 positive integers. What 
probability that the integer chosen is divisible by 6 or by 8? 


igne s or 
= (1,2, 3, ... , 199, 200), and therefore n(S) = 200 


Let A represent. the event that the integer chosen is divisible by 6, B, the event that the i 
chosen is divisible by 8, and A^ B, the event that the integer chosen is divisible by both 6 and 8, 
24 


Then we need P(A U B). SS 


"d 
ж; 


"Аға E "Jh rir a оа 


Now xo-[22]- 33, n(B) = [|+ 


Hence P(AU B)= Was УД 


365907257 2674 


Example 6,14 A pair o s thrown. Find the probability of getting a total of either 5 or 
The sample space bx outcomes when two dice are thrown. (see example 6.6 on page 1 


Let А be the event that a total of 5 occurs and В be the event that a total of 11 occurs. Then. 
events are 


A= {(1, 4), (2, 3), (3, 2), (4, 1)), and 
B= ((5, 6), (6, 5)). 


The events A and В are mutually exclusive as a total of 5 and 11 cannot both occur 
Therefore 


P(AU B) e P(A)  P(B) = 4 =. 


Example 6.15 Three horses 4,8 and C are in a race; A is twice as likely to win as В and 
as likely to win as C. What is the probability that А or B wins? 


Let  P(C)- p as ће events are not equally likely. 
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Then Р(В)-2р as B is twice as likely to win as C. 
Similarly P(A) = 2P(B) = 4p. 


Since A, В and С are mutually exclusive and collectively exhaustive, therefore the sum of their 
ities must be equal to 1. Thus, 


р+2р+4р=1 or p-l 


4 2 1^ 
P(A) ==, PB) == and PC) =~ 


Hence PAUB) = P(A)+ PB) = 3 +2 = S. 


Theorem 6.6 If A, B and C are any three events in a sample space 5, then the probability of at 
of them occurring is given by 


PAU BUC) = P(A) + P(B)+ P(C) - P(AnB)- Р(В OC) gno tanien 
Letus write AUBUC=AU(BwC) KS 
= AUD, where 0 = ВОС. 
PAVBUC)=P(AUD) кы 
-Р(4)-Р(р)- P(An D) by 
= KA) BUC) PLA Су, 
= P(A) + PO) + РС; BOC)-PIAN(BUC)] 
ive law of sets gives М2 
An(BuC)s ме 
Since the two sets are isjoint, their intersection is given by AAN BC. 
P[An(BuocC)» P 40B)u(4nC)| 
zP(AnB)*P(AnC)-P(AnBnC) 
itution of this result gives 
mRA4UBUC)sz Р(А) + Р(В) + P(C) - (An B) -P(BnÓC)- Р(АсуСу+ P(ANBOC) 
may be written as 
A UA Ау) = ZPU- Y P(A ^ Aj) + PLA гэ А, 04). 


i<j 


% the formula Minus 
4, U..A4,)7 УР) DAMA) + УРААА) - CY Р(А, m А, 0. Ау). 


fe fel 
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Example 6.16 А card is drawn at random from a deck of ordinary playing cards. What is 
probability that it is a diamond, a face card or a king? 


Let A represent the event that the card drawn is a diamond, B, the event that the card drawn £ 
face card, C, the event that the card drawn is a king, A B, the event that the card drawn із 
diamond and face card, and so on. Then we need 


P(AU BUC) = P(A) + P(B) + P(C) - (Am Ву- («BnC) -P(AnÓC)- P(AN BAC). 
n(4) ІЗ 


Now Р(4) = 9 = (there are 13 diamonds) 
P(B)= aes (there are 12 face cards) 
Р(С)= х ‚ (there аге 4 kings) 
P(ANB)= es . (diamonds and face card) е 
Ө 
Вес) | 4 
P(BAC)= EG 37' боола Kita QV 
P(ADC)= тас = = (diamond and 5 
P(ANBOC)= es sz dd and face card and и) 
т 
Hence, we get ee КУ 
PAUBUC) = +4 NS 4 1,1,2 045 
52 52 $2. SP 15204527552 


6.6 CONDITIONAL PROBABILITY 


The sample space for an experiment must often be changed when some additional infi 
7 pertaining to the outcome of the experiment is received. The effect of such information is to re 
sample space by excluding some outcomes as being impossible which before receiving the infe 
were believed possible. The probabilities associated with such a reduced sample space are 
conditional probabilities. The following example illustrates the concept of conditional probability. 


Let us consider the die-throwing experiment with sample space 5 = {1, 2, 3, 4, 5, 6}. Si 
wish to know the probability of the*outcome that the die shows 6, say event A. If before 
outcome, we are told that the die shows an even number of dots, say event B, then the information 
die shows an even number excludes the outcomes 1, 3 and 5, and thereby reduces the original 
space to a sample space that consists of 3 outcomes 2, 4 and 6, i.e. the reduced sample space is 8 


6}. Then the desired probability in the reduced sample space В is > since each outcome in the 
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space is equally likely. We call the i as the conditional probability of the event 4 because it is 
under the condition that the die has shown even number of dots. In other words, 


P(die shows 6/die shows even numbers) — i 


the vertical line is read as given that and the information following the vertical line describes the 
event. This is in fact the probability of getting a 6 in the reduced sample space B, and is 
das P(A/B), Thus we have the following; 
_ number of sample points in A ^ B 

number of sample points in 8 


n(An B) 
n(B) 
Deviding the numerator and the denominator 


mumber of sample points in original sample 
a(S), we get 


n(4nB) n($) P(40B) 


mS) (B) P(B) NS 
illustration leads us to make the isl НЕ Spei: и 
of conditional probability. 


and B are two events in a sample S S and if P(B) is not equal to zero, then the conditional 
of the event A given that EN occurred, written as P(4/ B), is defined by 


MS 


N^ /B)= 
eS 
8) = 0, the сопан probability P( 4/ В) remains undefined. 


.P(40B) 
y, Р(В/ 4) Қа). , where P(A) >0. 


be noted that Р(4/ 8) satisfies all the basic axioms of probability, namely 
9s P(A/ B)s1. 


Р($ г\В) _ РВ) 
ASIB)El [2$г\В=В,.Р(58/Ву= ————=———=1 
) [Sa > P(S/B) P(B) PB) 1 


$4 U 4, | B)= P(A, / B)+ P(A, / B), provided the events A, and A, are mutually exclusive 


QİLCONA TO STATISTICAL 


Thus to determine the conditional probability Р(4/ B), either we directly calculate the р! 
of A relative to the reduced sample space B or we use Р(Аг\В) and P(B), the probabilines 
events in the original sample space S. 


Example 6.17 Two coins are tossed. What is the conditional probability that two heads 
given that there is at least опе head? 


The sample space 5 for this experiment is 
S= (HH, HT, TH, TT}. 


Let A represent the event that two heads appear, and В, the event that there is at least 
Then we need P(A/B). 


Since A = (HH), B (HH, НТ. TH) and AC B = (HH). 


1 3 1 
P(A) 7, P(B)- 2 and PAG B) e 


Hence P(A/ В) = ———— = —_ =- 


Example 6.18 A man tosses two fair dice. What is thegBhditional probability that the 
two dice will be 7, given that (i) the sum is odd, (ii) the фт» greater than 6, (ii) the two 
same outcome? (P.U., B.A./B 


The sample space 5 for this experiment — the following 36 eoualiy likely outco: 


$7 (0, 1), 0,2), (1,3), (1, 4), GSU, Ө, 
(2, 1), (2, 2), (2, 3). (2, 5). (2, 6), 


pepe @, 5), (3, б), 
(4,1),(4,2),(4.3)49/4), (4, 5), (4, 6). 
65, 1), (5, 2) (89.5. 4), (5, 5). (5,0) 
(6, 1), (6, 6. 3), (6, 4), (6, 5), (6, 6)). 

Let A= {the S B = (the sum is odd]. 

C = {the sum is greater than 6}, and 

D = {the two dice had the same outcomes}. Then 
A * (1, 6), (2, 5), (3, 4), (4, 3), (5. 2), (6, D}, 
B = {(1, 2), (1, 4), (1, б), (2, 1), (2, 3), (2, 5), (3, 4), ... (6, 5)}, 
-C = {(1, 6), (2, 5), (2, б); (3, 4), (3, 5), G, 6), (4, 3), (4, 4), ..., (6, б)}. 
D - 1,1), (2, 2), (3, 3), (4, 4), (5, 5), (6, ©} ‘ 
Аг\В = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, D} 
ANC= {(1,5), (2, 5), (3, 4), (4, 3), (5, 2), (6, 13, 
AN De 
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> Ри) = d PB) == e. РС) = 27, PD) =É 


6 
ANB) =—, P(4nC)- — and P(AND)=0. 
Ас В) =É ANC) = P(AND) 


using the definition of conditional probability, we get 


PANC) 6,36 2 
PAORO ^36 21 7 


RAND) S 38 20 


P(DY 6 


Example 6.19 What is the probability that a randomly selected poker hand, contains exactly 3 
that it contains at least 2 aces? 


Let А represent the event that exactly 3 aces are selected and В, Ф event that at least 2 aces аге , 
Then we need P(4/B). ev 


a poker hand consists of 5 cards, therefore кй me 5 contains (2). 2,598.960 
\ * 


eS 


“(i af 
n(B) = auf 
бете И 


P(A! D)= 


448) (4748) (4/48 
2 т) М МЕ "| 
Р(В) = 


оч 


Р(АғВ) 


Hence Р(А/В)- Р(В) 


lale n: 
БЕЛЕ 


Theorem 6.7 Multiplication Law. If A and В аге any two events defined in а sample 


= 0.0416. 


then 


LI 


P(ArB)- P(A) P(B/ A), provided P(A) ғ 0, 
= P(B) P(A! B), provided P(B) #0, 
The conditional probability of B given that 4 has occurred is Se 


РАВ) © 
P(BLA)- T, where P(A) #0 к 


Multiplying both sides Бу P( 4), we get ee 
SS 
P(A B) = P(A). P(B/ A). ху 


LJ 
The second form is easily obtained by neget A and B. 
This is called the general rule of multi ion for probabilities. 
Alternative Proof. Let 5 a sample space of an 


experiment having л equally li outcomes (sample points). 
Let m, be the number ofj&mÍnple points contained in 4 


(including those common ; т, be the number of sample єс Я : 
points in В and m, be олег of sample points belonging Bees 
both to A and В. Then (assuming m, >0, т, >0), 


P(An B) =Z 
n 


The fraction > may be written as "CN т. 
n n m n 
But 71. P(A) and “3. = conditional probability of B, given that 4 has occurred, ie 
n ‘ т, 


Hence Р(Ас\В)= P(A) Р(В/ A). 
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Since the joint event Агу involves A and В symmetrically, therefore interchanging A and B, we 


Р(Ас\уВ)= P(B) Р(А/В). 
This rule may be stated as below: 


“The probability that two events A and B will both occur is equal to the probability that one of the 
will occur multiplied by the conditional probability that the other event will occur given that the 
гелі has occurred." 


. This rule may be extended to several events. In case of three events A, B and C, we һауе 
Р(Ас\уВг\уС)= Р(р г\С), were D= ANB 

= P(D) P(C/D) 

= P(ANB) Р(С/ An D) 

= P(A) Р(В/ A) P(C/ An В); 

Similarly, for more than three events, the formula may be proved by @rathematical induction, 


Example 6.20 A box contains 15 items, 4 of which are def f and 11 are good. Two items are 
What is the probability that the first is good and the seco: fective. 


Let А represent the event that the first item selected is god B, the event that the second item is 
t АС; 

Then we need to calculate the probabili of the joint event AAB Ву the rule 
п 9 
Now Р(4)--- > 

De 
wen the event А has MOON remains 14 items of which 4 are defective. Therefore the 
of selecting a deci а good has been selected, іе. P(B/ 4) = 5. 


соле 
14 210 


le 6.21 Two cards are dealt from а pack of ordinary playing cards. Find the probability that 
dealt is'a heart. 


H, represent the event that the first card dealt is a heart, and H,, the event that the second 
is a heart. Then 


ond card is a heart) = P(first card is a heart and second card is a heart) + P(first card is not a 
and second card is a heart). 


Н.) = РНН) + P(A, H3) 
= P(H,) (Hs ! Hu) P(H,)P(H | Ну) 


Р(Аг\В)= P(A). P(B/ A) = x 


https://stat9943.blogspotCQI'ro sratisTicaL 


Е 2) Е = 
=| —x— |+| —x— 
52 51) \52 51 


Example 6.22 Box A contains 5 green and 7 red balls. Box В contains 3 green, 3 red and 6 
balls. A box is selected at random and a ball is drawn at random from it. What is the probability 
ball drawn is green? 


Let Е represent the event that the green ball is drawn. Then Е can occur in one of the follo’ 
mutually exclusive ways: 


i) Boi А is selected and a green bull is drawn, Le. ANE, or 
ii) Box В is selected and a green ball is drawn, i e. BoE. 
Теркен = P(box А and green ball) + P(box В and green ball) 
= P(box А) P(green ball/box А) + P(box B) x P(green ball/box В) 


In symbols, Р(Е) = P(4^ E)* P(Bn E) о 
= P(A) P(E/ Ау+ Р(В)Р(Е/ B) Кы 
ТЕЛ! 
“Tht 3 EN 


Example 6.23 Ап um contains 10 
black balls. Two balls are transferred from 
from the latter. What is the probability 


Let A represent the event that 2 
Then А can occur in the followii 


AE 


41 vie ie 1 black ball, 


-2 dics gg 


10) (i3 
= л (02) 


_(үзү (13). зо 
ГЕННЕН т "8 
3) (13) 3 
mE 


The second urn after having transferred 2 balls from the first urn, contains 
i) $ white and 5 black balls (2 white balls transferred) 
ii) 4 white and 6 black balls (1 white and 1 black ball transferred) 


iii) 3 white and 7 black balls (2 black balls transferred) 


a white ball? (P.U,, B.A/B 


are drawn from the first urn and transferred to the 
mutually exclusive ways: 
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pe EUR INI ИВ А ior Anna би seeds аараан 2 
from the first urn. Then 
P(W) = PIW ^ A) + PV A45) + PIWA A) 
543,15 
N WoA)2—x—2— 
ow PU OA) 5 82: 
4.30 2 
Ww A = ———, 
REO wn np 
3. 3. 
WAA x --- 
м 2221078 ^29 
слеті» улу o 
2473-2759 
ш--%--%--------0,4538 
RENS 13 260 130 
жет А card is drawn at random from a deck of ordinary playing cards. What is the 
ity that it is a diamond, a face card or a king? (P.U., B.A./B.Sc. 1992) 


Ep mn е 6.16. A second approach is 
below: 
Let А = the card drawn is a diamond, КУ 
В = the card drawn is а face card, and S 
C = the card drawn is a king. 
need eS 
ME EA AT E P- Р(ВоС) -P(4nC)« P(An BnC). 
4 xd 
ani 3 p(B) = =, P(C)= 


йе 


12574,» .4 
ІВу---Х-----,: 
BAC)=P(B)P(C/B)= 52°12 52° 
AC)= P(A)P(CIA) = xL (or ACH A) 51) 
52 13 52 52 4 


с\Вг\С)= Р(А) Р(В/ A) Р(С/ AG B) 


жез 
52 133" 7 


135/12. 4.3. 4.12. 


22 
=— = 0.423 
52 
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Example 6.25 Three urns of the same appearance are given as follows: 


Um A contains 5 red and 7 white balls, 
Um В contains 4 red and 3 white balls. 
Um C contains 3 red and 4 white balls. 


An urn is selected at random and a ball is drawn from the urn. 
i) What is the probability that the ball drawn is red? 
ii) If the ball drawn is red, what is probability that it came from urn A? 
ч 5 


i 15 R 
| 12 w 


~~ ж 
Неге we first select one of the three b) then we draw a ball which is either red 
(№). In other words, we perform a of two experiments. This process is 
probability tree diagram, (sce page 2% hich each branch of the tree gives the respective 
Y 
Now the probability of s um А, for instance, and then a red ball (R) is 15-5 


the probability that any partj jar path of the tree occurs is, by the multiplication law, the 
probability of cach path. 


i) . Now the prBohbility of drawing a red ball is given by the relation 
P(R) = P(A) P(R/ A) + P(B) P(R/ B) P(C) Р(В/С) 
as there are three mutually exclusive paths leading to the drawing of a red ball. 


„11? 04122 
252 


ii) Неге we need the probability that urn А is selected, given that the ball drawn is 
P(A! В). 
By definition, P(4/ R) = АЛУУҢ) 


P(R) 
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P(A ^ R) = Probability that urn А is selected and а red ball is drawn 
5:25 
3 T2 36 
5136 
Hence P(AIR)=— 
ence PAIR) = 1197252 


19. 


INDEPENDENT AND DEPENDENT EVENTS 


Two events A and В іп the same sample space S, are defined to be independent (or statistically 
dent) if the probability that one event occurs, is not affected by whether the other event has or has 


sccurred, that is 
Р(А/В)-Р(А) and Р(В/А) = P(B). 
follows that two events А and В are independent if and only if 
P(An В) = P(A) P(B) 


& 
The events A and B are defined to be dependent if PLB) = P(A)xP(B). This means that the 
of one of the events in some svar iy fhe re oe 
it is to be emphasized that two mutually br ys ts А and В аге independent if and only if 
P(B) = 0, which is true when either P(A) =, ^ B) = 0. If both events А and have non-zero 


ities, they must have a sample point in on. Thus two events that are independent, can never 
Пу exclusive. Moreover, two event: are mutually exclusive, are also dependent events, and 
that are non-mutually exclusive, either be independent or dependent events. 


Three events А, B and C, all de Dude аперер, шейн yelin 
satisfy the following — 


They are pairwise іі деп, Le, Р(Ас\В)= P(A) P(B); 
Р(Аг\С) = P(A). P(C); P(BOC) = P(B). P(C). 


They are mutually independent, i.e. 
P(AQN BOC) = P(A). P(B). P(C). 


ts general, the k events 4), Az, Ap are defined to be mutually independent if and only if the 


of the intersection of any 2, 3, ..., or k of them equals the product of their respective 
s. 


Example 6.26 Two events А and В are such that P(Ay- t. P(ALB)- 2, and P(B/A)= 


wir 


i) Are A and B independent events? 
ii) ^ Are A and B mutually exclusive events? 
iii) Find Р(Аг\В) and P(B). 


| 
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i) 14 and B are independent events, then Р(4/ 8)- P(A) 
Now PA)» and P(AIB)- ie. P(AI B)e P(A) 


Hence А and B are not independent events. 
i) IfA and B are mutually exclusive events, then Р(4/ B) = 0. 


But it is given that PA1B)-3 


Hence A and B are not mutually exclusive events. 
ii) Now P(40 B) P(A) Р(В/ A) 


215241 
43 6 
By definition, we have 
P(B) Р(4/В)- P(A) P(B! A) e 
| 1) (1Y2 1 $ i 
B) —|=|— |— | so that P(B) -— s 
= паўз) (99% кө 59175 
Example 6.27 Two fair dice, опе red and are thrown. Let А denote the event & 
red die shows an even number and В, the event green die shows a 5 or a 6. Show that the 
A and B arc independent. 


The sample space 5 contains 36 CODES А? 
Given 4 = event that red die shgan even number. 

B= event that green АК Shows а 5 or a 6, and therefore 
ANB = event that тед shows an even number and green die shows a 5 ога б. 


Then A contains 18 опе, B contains 12 and 4B contains only 6 outcomes. 
. 1 Ы 
Associating with each outcome a probability of 36° we get 
2822 12:74 6 1 
Ау-----.Р(В8)----- Вуз---- 
Nye ot 2: Be ud le a ce, 


Since PUAN B)= c exe PU) PUB), therefore the events A and B are independent) 


Alternatively. 
PIANB) V6 1 
A BY ta ee oe 
PCA/B) PB) 1372 P(A), and 
P(4^B) V6 1! 
/ -——— —mÁ—- 
ыры, P(A) 12 3 (8) 


Hence the events A and В аге independent. 
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Example 6.28 Let A be the event that a family has children of both sexes and В be the event that a 
has at most one boy. If a family is known to have (i) three children, then show that A and В are 
events, (ii) four children, then show that A and B are dependent events. 

(P.U., В.А. (Hons.) Part-I, 1970) 


Let denote a boy and ga girl. Then 
9  theequiprobable sample space 5 would be 
= (bbb, bbg, bgb, gbb, bgg, gbg, ggb, ggg) 
The two events are 
A = {children of both sexes}, 
= {bbg, bgb, gbb, bgg. gbg, ggb}, and 
B = {at most one body}, 
= {bgg, gbg, bgg. ggb, ggg} 
The event ANB is S 
Ас\В = {bgg, gbg, ggb} ; "2 
Thus their respective probabilities are $ 
PA) - $=, РВ) 2-5, and ка E 


AA) Р(В)= ixl-p-PdoB. SU 


Жетсе А and В аге independent. S 


the sample space 5 may be by the following 16 equally likely outcomes: 

5 = (bbbb, bbbg, bbgpXDgbb, gbbb, bbgg, bgbg, gbbg, gbgb, bggb, ggbb, bggg, gbgg, ggbg, 
врео, gees} ҳу 

events are: 


А = (bbbg, bbgb, bgbb, gbbb, bbgg, bgbg, gbbg, gbgb, bggb, ggbb, bggg. gbgg. ggbg, gegb} 
B = {bggg, gbgg, gebg, gggb. segg). and 

18 = (bggg, gbgg, gebg, geeb} 

= probabilities are 


PAy- eol. РВ) = - and PB) RU 


A and B are dependent events. 
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Theorem 6.8 If A and В are two independent events, then | 
Р(АсВ)-Р(4). P(B). 
Proof: Since А and В аге йрй events, therefore 
Р(А)-Р(А/В) and P(B)- P(B! A). 
Substituting these results in the general rule of multiplication, we obtain 
Р(Ас\В)= P(A) P(B). 
Theorem 6.9 If А and B are two independent events | in a sample space 5, then (i) A and 
independent, (ii) A and В are independent, and (iii) A and В are independent. 
Proof. (i) The events ANB and AB are mutually exclusive and their union is 
A=(ANB)U(ANB). 


> 


Therefore | P(4)- P(4^ B) (46 B) 
or  P(AnB)- P(4)- (An B) SS 
= P(A) - Р(А)Р(В) Гг кча ваен) 
= P(A) [I~ Р(В)]= мо AER 
Hence A are B are independent. ы” 
(ii) Similarly, 
P(B) = БЕЙ 
ог  P(BnA)- AE Аял 
Ww- P(B) P(A) [:: A and B are independent ] 
% P(B) [1- P(A)= P(B) P(A) 
"Therefore A and В are independent. 
(iii) Using De Morgan’s law, AB = AU B, we have 
Р(Я В) = Р(АО В) 
z1-P(AuB) 
-1- P(4)- P(B)* P(4 B) 
=1-Р(А)- Р(В)+ P(A) P(B) 
-[1- P(A)][1— P(8)] = P(A) P(B) 
which shows that A and B are independent. 
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Example 6.29 Two cards are drawn from a well-shuffled ordinary deck of 52 cards. Find the 
ity that they are both aces if the first card is (i) replaced, (ii) not replaced. 


(P.U., B.A/B.Sc., 1967; M.A. Econ. 1969) 
Let A denote the event ace on first draw and B denote the event ace on the second draw. 
*  Incase of replacement, event A and В are independent. 
Thus Р(Боҝ cards are aces) = P(A ^ B) = P(A) P(B) 
i 4.4 1 
=—xX— =— 
52 52 169. 
If the first card is not replaced, th8fi A and B are dependent events and therefore 


P(both cards are aces) = P(first card i is an ace) x P(second card is an ace given that the first 
card is an ace) у 


41232-71 
i.e. P(ANB)= P(A) P(B/ A) =. =, 
52:51 221 S 


Example 6.30 А pair of fair dice is thrown twice. Wis fabio geting loan 
(Р.О., В.А /B.Sc. 1978) 


Елси ent of getting а total of 11, when two 
are thrown. 
E 


Then A can occur in the following two ways: v 


7 (a total of 5 occurs on throw], 


7 (a total of 5 sors second throw], 


occur in the = меч two iig 
— (a total Kecir on the first throw}, 


= {а tomer 11 occurs on the second throw}. 
joint event ANB сап occur in two mutually exclusive ways 4, В; or BOA, ie. 
14 ^ Bj) v(B, ^ А,), 
Р(Ас\В)= P(A, A B) + P(B, A 4) 
= P(A,) P(By)  P(B,) P(4;) 
(^^ events are independent) 


E ATE ain 
.36 36 36 36 162 162 81 


210 
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Example 6.31 The probability that a man will be alive in 25 years is 3/5, and the probability t 
his wife will be alive in 25 years is 2/3. Find the probability that (i) both will be alive, (ii) only the m 
will be alive, (iii) only the wife will be alive, (iv) at least one will be alive and (v) neither will be alive 


25 years. 


(P.U., B.A./B.Sc. 197 


Let А be the event that the man will be alive and B be the event that his wife will be alive in. 
years. Then 


ii) 


iii) 


iv) 


у) 


3 2 
PlA)==, and P(B)==. 


We need the probability that ¥oth will be alive, ie. P(A В). 
Since A and B are independent, therefore ^. 


P(An B)  P(A). P(B)= 2x5 ==. 


We need the probability that only the man will be alive, ie. P(4 B). 
Since A and B are independent and Р(В8)-1- P18). ore 


P(AMB) = P(A) B= {1-3 |=+ eR 
ae 5 GV 


We require the probability that only the wi vig be alive, іе. P(4 r B). Thus 


P(A B)» P(A) nn-33- ASh the event 4 and B are independent 
Р(4)-1-Р(4). Ку 
We require the probabili at least one will be alive, ie. P(AU B). 
Since the events А apd) are independent and no mutually exclusive, therefore 
ж P(AU В) = P(A) + Р(В) – Р(А т В) 
% 2912721 


We need the probability that neither will be alive, ie. P(A ^B) 
Since 4 and B are independent, therefore 
P(A г\В)= P(A) P(B) 
= 1- P(A)] 1 Pani 
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6.10. Independent Repeated Trials with two Outcomes. If the probability of an event A 
ing in a ingle trial is p, then the probability of its occurring k times in т independent trials is given 


P(A)= nz 47%, where q = 1 - p. 


21 


ї If the event A occurs Ё times in n independent trials, then the event 4 (not-4) will occur in the 
ing (n-k) trials, that is 


ААА...А AAA...A 
— йке «елене 


k times n-k mes. 


The probability of this sequence would be 
РРР---Р 999-4 
EC cm ҚҰРТ E 


EU E orna a MSS аа т k) times. The 
of possible sequences is (4 M 
Hence the required probability is s 

УК ys k nk 


ing to note that this is a special sex, general result, called the binomial law, which is 
in a bit detail in chapter 8. x 
Жтатр!е 6.32 Five coin are кеңес (or one coin is tossed five times). What is he probabiliy 
precisely 3 heads? к 


We know that the pon in a head is i 


BE Corrosion sido perit otim) = 09 Bi ы 


5 
1 5 
-10х|-| ==. 
(5) 16 
le 6.33 If 60 percent of the voters in the City of Lahore prefer candidate X, what is the 
that in a sample of 12 voters exactly 7 will prefer X? 


n=12, к= 7, р= 0.60 and д = 0.40 


P(7 out of 12 prefer X) = Б Josy (0.40)2:7 


= (792) (0.02799) (0.01024) = 0.227 


A J Theorem 6.11 Bayes’ theorem. If the events А.4,,.., А, form a partition of sample space 5 
227 that is, the events 4; are mutually exclusive and their union is 5, and if B is any other event of S such t 
L4 

it can occur only if one of the А; occurs, then for any i, ! 


| Piá /ву=-[ ADP A) fori-1,2,..., К. 


Y PI PG! А) 
іні - = 


Proof: Ву the iultiplicative law of probabilities, we have 
P(B A A) = P(B)P(A;/ В) 
= PAP! А). 

Equating the equivalent relations of P(B ^ А,) and solving for Р( А, / B), we get 
P(A, pye AE 
We may write the event В as В=$г\В (sec the Venn 
diagram) 

= (АЧА, U..04,)^8 

= (4 8) О (А, су B) Uu... (4; OB), 
where the А, г В are also mutually exclusi ым” 

Therefore P(B) = P(A, 7 B)+ OB) +... PUA, OB) 


Using the multiplicative wer probabilities, we may express each term P(A, 
Р(А,)Р(В/ Aj). Then à 


P(B) = P(A, наў A;) P(B! А,)»...«Р(А,)Р(В/4,) 


B is shaded 


А 
=} P(A) P(BLA;) 
ізі 
This result is generally known as the theorem on total probability. Replaci 1 z 
probability formula for the event B, we obtain Bayes’ formula as POR 
P(4;1By= -PDPB A) 


k 
DPA) PCB! A) 

ізі 
This result is known аз Bayes' theorem after an English clergyman Thomas Ва) I 
who derived it and first used in a paper that was published posthumously in 1763. It should be a 
the original probabilities Р(4,) are known as the a priori probabilities and the conditional pre 
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8) are called the a posteriori or inverse probabilities because probabilities are revised after some 
information has been obtained. Bayes’ formula is also called the formula for probabilities of 
ses on account of the reason that the events 4), 4,,4, may be thought of as hypotheses to 
for occurrence of the event B. 


Example 6.34 In a bolt factory, machines А, В and C manufacture 25, 35 and 40 percent of the 
put, respectively, Of their outputs, 5, 4, 2 percent, respectively, are defective bolts. 4 bolt 1s 
at random and found to be defective. What is the probability that the bolt came from machine 47 


The a priori probabilities (before the information that the bolt is defective) are 
025, Р(В) = 0,35, and P(C) = 0.40. 
Le E represent the&vent that a bolt is defective (D). 
the conditional probabilities are 
P(E/ А) = 0.05. P(E} B) 20.04, and P(E/C)=0.02. 


outcomes with their respective probabilities may be shown by е diagram as below: 


ы 
Prior Probability Conditional Probabj Joint Event 
= 
= Р(А). Р(Е/А) 
N 
P(B)=0.35 p P(E D = P(B). P(E/B) 
“с 
>. N 
D = P(C). P(E/C) 


£) is the a posteriori probability that the selected defective bolt came from machine А. 
Sy Bayes’ theorem, we get 
Es P(A). Р(Е/ А) 

P(A), P(E/ A) + P(B).P(E! B)-- P(C).P(E/C) 


È (0.25) (0.05) 
(0.25) (0.05) + (0.35) (0.04) + (0.40) (0.02) 
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Similarly, the posterior probabilities of machines В and C are 
P(B!E)-0.406, and Р(С/Е) = 0.232 


Example 6.35 Ап urn contains four balls which are known to be either (i) all white or 
white and two black. A ball is drawn at random and is found to be white. What is the probability 
the balls are white? (P.U., B.A./B.Sc. (Hons.) Part-II, 1 


Let A, be the hypothesis that all the balls are white and A, be the hypothesis that two 
and two black. Then the а priori probabilities must be 


P(A,) = Р(А,) = E as the selection dfa hypothesis is random. 


Again let B the event that the ball drawn is white. Then the conditional probabilities are 


4 
с хара 
P(B,A4)*-—-1 and Р(8/4,)---і--. 
“с : NI 
Therefore by Bayes’ theorem, we get the a posteriori p nities 
P(A) P(B/ 4) Xs 
P(4;) P(B! 41) + P(A>) PCB) 4%) 


[onu 


әу 
2)Р(В/ Ay) 
P(AQgRB А,)+ Р(А,)Р(В/ А,) 


P(A, /B)= 


P(A,/B)= 


Hence the first hypothesis. ie. all the balls are white, is preferred as it has 
probability, 


‚ EXERCISES 


OBJECTIVE 
Answer ‘True` or ‘False’. If the statement is not true then replace the underlined words 
make the statement true: 4 


i) The probability of an event is a whole number. 
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_ If two events are mutually exclusive, they аге also independent. 
If A and B are mutually exclusive events, the sum of their probabilities equal to one. 


The sample points of a sample space are equally likely events. 


If the sets of sample points belonging to two different events do not. intersect, the events are 


independent. 
Probability of head on tossing a coin is 4. 


If P(A/B) ='P(A) and P(B/A) = P(B) then the two events A and B are independent. 
Itis always true that P(A) -1- P(A). 


The probabilities. gf complementary events always are equal. 
If events A and B are statistically independent then Р(АПВ) = P(A) + P(B). 


S 
EX ev 
А simple event is E 

а collection of exactly two outcomes. N 
does not include any outcome, - 
£j. includes one and only one eutcome. x 
includes more than one events. © 


CHOICE QUESTIONS 


A compound event includes SE 
3) atleast four outcomes ° 
%) опе апа only one ош о 
at least two ошсопіЮф» 

X» all the outcomes of an experiment 


The probability of an event is always 
3» greater than zero 

= less than | 

in the range zéro to 1 

greater than 1 


classical probability method is applied to an experiment that 
cannot be repeated. 

has equally likely outcomes. 

has all independent outcomes. 

does not have more than two outcomes. 
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v) The relative frequency method is applied to an experiment that 
(9 does not have equally likely outcomes but can be repeated. 
b) does not have equally likely outcomes and cannot be repeated. 
€) hasequally likely outcomes and cannot be repeated. 
d) hasall independent outcomes. 


vi Which of the following values cannot be the probability of an event? 
`а) 82 
b) 0 * 
(Ӛ 176 
d) 0.36 


vii) Ina group of 400 families, 300 own houses. If one family is randomly sel 
group the probability that this family owns a house is; 


Саў 35 қ < 
b) .25 s 
с) .80 
d) 40. S 
ANS 


viii) Two mutually exclusive m by 
a) always occur эт 
(Б) cannot occur веў 
с сап someti; таи together. 
d) can never ж іг together. 
ҳу 
i) Тһе мо MN А and В are mutually exclusive. Which one of the following 
be true? 
@ P(AnB) = 0. 
‚ ^") P(An^B)-1. 
c) P(AUB)=0. 
d) P(AUB)=1. 


x)  P(A) 0.6 and P(B) = 0.5. Which of the following statements is true? 
а) AandB are mutually exclusive. 
(©) А and B are not mutually exclusive. 
c) Aand Bare independent. 
d) A and B are dependent. 
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The conditional probability of event A given that the event B has already occurred is written 
as: 3 


а) Р(А-В). 
b) P(B/A) 
c) P(A^B) 
(8) P(A/B) 


Two complementary events 
A have no common outcomes 
b) have common outcomes 

с) contain the 9 пе outcomes 


d) сап һауе common outcomes. 


The union of two events A and B is written as: 

(ay (AorB) - $ 
b) (А and B) e : 
c) (B/A) . oV 


S 


d) (A/B) б 


The intersection of two events A and В is NE as: 
a) (A or B) T S 

{®)) (A and B) M 
c) (AB) 59 


d) (A/B) «9 
The joint probability Охо independent events A and B is: 
a) P(A) +P(B) 
b) P(A) + P(B) – P(A^B) 

(<) P(A)P(B) 
d) P(A)P(A/B) 


List all the proper subsets of the universal set 
S = {chair, student, pen}. 


Construct a Venn diagram to illustrate the following subsets of 
S = (ball, pen, table, coin, die, card, book}, 
А = ball, pen, book}, В = іреп, table. coin), C = {card}. 
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6.4 


a) Let A = {1,2}, В = {2,3} and С = {3} be subsets of the universal set S = 1 
Determine the elements of the following sets: А 
) АХА, ШАХВ. iii) Bx A, 
iv) (Ax B) U (ВхС) v) (AX B) ^ (Bx C) 


b) Let A= {2,3}, B = {1,3,5} and C= {4,6}. Construct the “tree diagram" of A x B® 
a) Explain what is meant by a Random Experiment, a Sample Space and an Event. 


b) Let A, B and C be events (subsets) in a sample space S defined by 


S = {1,2,3,4, 5, 6, 7, 8, 9, 10) 
А = 2,3, 4) В = {3,4,5}, С= (5.6.7) 


List the members of the following events; — * 


6.5 


6.6 


6.7 


i) A OB, ii) A OB, iii) ANB, iv) Ағ(В.С), 
IfS= {0, 1, 2,3, 4, 5, 6,7, 8,9), A= {0, 2, 4, 6,8}, 


B= (1,3, 5,7, 9}, C= {2,3,4, 5) and D = (1, 6, 7), іңміһе elements in the follows 
i AUC, АРВ, iti) С, iv) (С AD)UB D 


у) (SAC), vi AnCoD ev 
8) A pair of dice is rolled. List the sengr the sample space S. Let A denote 
“the sum is less than 5” and B AS “а 6 occurs on either die". List the сі 


corresponding to event A and to Wàt B. 
. 


b) Two dice are rolled. Let ge event that the sum of dots on the faces s 
а 


and В the event that he t least one 3 shown. Describe AUB; ANB: А 


(An B) UA. AY 
a) Enumerate all ые (1) combinations and (ii) permutation of 3 letters 
the four енеге B, C and D. 


b) Aclub SS of 15 members. In how many ways can; 
i) the three officers; president, vice-president, and secretary-treasurer, be с 
ii) a committee of three members be selected? 


How many different bridge hands can be selected from an ordinary deck of 52 play 


We have п = 10 persons and we wish to divide them at random into 3 groups со! 
and 2 persons respectively. In how many ways is this possible? 


a) Define the terms: Experiment, Outcome. Event, Sample Space, Simple and 
Events, Mutually Exclusive Events. 


b) Show that a sample space consisting of 4 elements has 2* different events. 


c) Prove that - |ң 4%, sfr]: » 


using the fact that a sample space with n elements has 2° subsets 


“ 
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Distinguish among classical or a priori probability, relative frequency or a posteriori 
probability, axiomatic probability and subjective or personalistic probability. What is the 
disadvantage of each? Why do we study probability theory? 


Explain what is wrong with each of the following statements; 


a) 


b) 


c) 


d) 


An investment counselor claims that the probability that a stock's price will go up is 0.60, 
remain unchanged is 0,38, or go down is 0,25, 


Lf two coins are tossed, there аге 3-possible outcomes: 2 heads, one head and one tail, and 
2 tails. Hence probability of each of these outcomes is +. 


The prlbabilities that а certain truck driver would have no, one and two or more accidents 
during the year аге 0.90, 0.02, and 0.09. 


P(A) = 3.Р(В8)- i -P(C) = + for the probabilities of three mutually exclusive events А, 


3 
B and С. 


Find the probability for each of the following events: 


i) 

ii) 
їй) 
1у) 


а) 


b) 


a) 


(b 


2! 


“ 


Ап odd number appears іп a single toss оҒа fair die. $ 

The sum 8 appears in a single toss of a pair of fair фсес, 

At least one head appears in three tosses of a fair са“ 

А king, ace, jack of clubs or queen of diamo rs in drawing a single card from a 
well-shuffled ordinary deck of 52 cards. М (P.U., В.А /В 5с. 1970) 


Describe the classical. relative freque A subjective concepts of probability. 


A marble із drawn at random {го x containing 10 red. 30 white. 20 blue and 15 
orange marbles, Find the proba that it is (i) orange or red, (ii) not-'red or blue’, (ii) 
not blue, (iv) white, (v) red. wir or blue (P.U., B.A./B.Sc. 1970) 


If two dice are thrown, Are the various total number of dots that may turn up? What 
are the probabilities о ФЕ of them? What is the probability that the number of dots will 
total at least four? X& ғ 


Show that in а single throw of two dice, the probability of throwing more than 7 15 equal 
to that of throwing less than 7, and hence find the probability of throwing exactly 7. State 
clearly what assumptions you are making. (P.U. B.A" B. Sc. 1981) 


Two dice are thrown. Let A be the event that the sum of the upper lace numbers is odd. 
and B the event of at least one ace? Assuming a sample space of 36 points, list tlie sample 
points which belong to the events 4/3 B, AUB and 4 су 8. Find the probabilities of 
these events, assuming equally likely events. 

Two good dice are rolled simultaneously. Let A denote the event “the sum shown is 8" 
and B the event "the two show the same number" Find P(A), P(B), Р(Ас\в). and 
Р(А B). 


A box contains six discs numbered 1 to 6. Find for each integer k from 3 ro 11, the probability 
Жыл the numbers on two discs drawn without replacement have а sum equal to К. 


(P.U., B.A./B.Sc. 1977) 
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6.18 Ina single throw of two fair dice, find the probability that the product of the numbers 


dice is (i) between 8 and 16 (both inclusive), (ii) divisible by 4. (P.U., B.A./B.Sc. 
6.19 а) Elaborate the statement that “two mutually exclusive events need not be equally 
by giving suitable examples. 


b) Compare the probabilities of at least one 6 in 4 tosses of a fair die with the p 
at least one double 6 in 24 tosses of two fair dice. 


€) Compare the probability of a total of 9 with that of a total of 10 when three fair 
tossed once. з 


Hint; To find ће sample points in different events, combine (ће faces of the third 
relevant sums of the first two dice. 


620 а) Froma pack of 52 cards, two are drawn at random. Find the probability that one % 
and the other a queen. (P.U., B.A/B. 


b). A set of eight cards contains one joker. A and В are two players and A chooses $ 
random, B taking the remaining 3 cards. What is the probability that A has the j 


6.21 a) ОГ12 eggs in a refrigerator, 2 are bad. From thi eggs are chosen at random 
cake. What are the probabilities that (i) exacti Qs is bad? (ii) at least one is 


b) A certain carton of eggs has 3 bad eggs ae good eggs. An omelette is made 
randomly chosen from the carton. 5 the probability that there are (i) по 
(ii) at least 1 bad egg, (iii) exactly 2 ggs in the omelette? 


ó, 6.22 à) Aninteger between 3 and 12 «деге is chosen at random. What is the рго 
^7] js an even number? That ter ven and divisible by 3? 


b) Three distinct intege: hosen at random from the first.20 positive intege 
the probability that (i ir sum is even, (1i) their product is even. 


с) А box contai a 4 white and 5 green balls. Three balls are drawn 
together. Findet probability that they may be (i) all of different colours, ( 


same col. 
A 
Jý 6.23 а) Find «à ility of obtaining at least one 6 when (1) 5 dice are thrown, (1 
thrown! 
'b) How many dice must be thrown so that the probability of obtaining at leas 
least 0.99? 2 


c) A missile is fired ага target and the probability that the target is hit is 0.7. Fi 
missiles should be fired so that the probability that the target is hit at least 
than 0:995. 


624 A bag contains 14 identical balls, 4 of which аге red, 5 black and 5 white. Six 
from the bag. Find the probability that (i) 3 are red, (ii) a least two are white, 
(P.U., B.A. Hons. in 


6.25 а) Three applicants are to be selected at random out of 4 boys and 6 girls 
probability of selecting (i) all girls, (ii) all boys, (iii) at least one boy? 


627 Ча) 
4 


b) 


с) 


PROBABILITY 
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b) Froma group of 6 men and 8 women, 5 people are chosen at random. Find the probability 


that there are more men chosen than women. 


626 А firm buys 3 shipments of parts each month. The purchasing agent selects at random from 
among four in-state suppliers and six out-of-state suppliers. What is the probability that the 
orders are placed with 


i) the in-state suppliers only? 
ii) the out-of-state suppliers only? 
iii) at least one in-state supplier? (P.U., М.А. Econ. 1979) 


Using a sample зрасе оғ otherwise, show that for any two events A and В, 
P(A uB) = P(A) + P(B) - Р(А В). 


А class сопіё х 10 men and 20 women of which half the men and half the women һауе 
brown eyes. Find the probability that a person chosen at random is a man or has brown 
eyes. (P.U., B.A./B.Sc. 1974) 


In a group of 20 adults, 4 out of the 7 women and 2 out е 13 men wear glasses. What 
is the probability that a person chosen at random тор Йе group is a woman or someone 
who wear glasses? Ke 


The events E, and E, are neither indepen ж mutually exclusive. Denote Ьу piz 
the probability that Е, and Е, both ha; ove that the probability that at least one of 
E, and Е; happens, is p, + p; — ху (P.U., B. A./B.Sc., 1975) 


One integer is chosen at ran om the numbers 1, 2, 3, .... , 50, What is the 
probability that the chosen 5% is divisible by 6 or by 8? 


Define the probability of. ent. 
State and prove the а@& Мїоп law of probability for any two events A and B. 


А drawer conta; bolts and 150 nuts. Half of the bolts and half of the nuts are rusted. 
If one item i en at random, what is the probability that it is rusted or is а bolt? 


Define Mutually Exclusive Events. State and prove the theorem of addition of 
probabilities concerning mutually exclusive events. 


What is the probability of throwing either 7 or 11 with two dice? 

If A and B are mutually exclusive events and P(A) = 0.4 and P(B) = 0.5, find (i) (4,5), 
(ii) PCA). 

If A and B are any two events defined on a sample space S, show that 
РКА УВ ) (ву )] = P(A) + PIB) - 2Р(А 58). (P-U., B.A/B.Sc. 1989) 
Let A and B be cvents with PAB) ==. P(A)= i and Р(Ас\В) = i find (i) P(A), 
(ii) P(B), (ii) P(40 В). 5 


6.32 


6.33 


6.35 


6.36 


6.37 


€) А) 3. PAB) = > and P(B) = =, then find O (ANB), (i) PCAOB 
(ій) P(4 WB) and (iv) РВА). 
Using the Venn diagram, show that 
i) P(AUB) = P(A) + (B) - An B). 
ii P(AUBUC)=P(A) + Р(В) + (C) - P(AMB) - P(BoC) - (AnC) * (AnBn 
(P.U., B.A./B.Sc., 1976, 
a) Explain what is meant by conditional probability. 
b) A pair of fair dice is throw. If the two numbers appearing are different. find 
probability that (i) the sum is 6, (ii) the sum is four or less. (P.U., M.A. Econ. 1 
‘ 
а)” А box contains 4 bad and 6 good tubes. Two are drawn out together. One of them is 
and found to be good. What is the probability that the other one is also good? 


b) Pon NAE такса лы торлау fee sun o£ do M 
given that it is odd. 9 | 


c) Ina firm. 20 percent of the employees һауе aceguinting background, while 5 
the employees are executives and have acgajeting background. If an employee Ё 
accounting background. what is the proba! that the employee is an executive? 


A box contains 10 red and 12 white rose . Flowers are picked up at random one 
without replacement. What is the pro MDY that; 
1) the first 3 flowers аге red? № 
ii) there are 2 red and 2 whitg des in the first fou picked up? 
iii) the third one is red кре the first 2 are white? = —(P.U., В.А./В.ӛс., 19 
a) State and prove Ыза law of probabilities for апу two events А and В. 
b) Detinel int Events and Dependent Everts. Give examples. 
c) Given PON- 0.60, P(B) = 0.40, Р(АгуВу = 0.24, find Р(4/В), P(A B), 
Р(В/А), P(B ). What is the relation between 4 and В? 
a) Differentiate between independent and mutually exclusive events. Are ind 
mune {ly exclusive? 


b) Let A and В be two events associated with an experiment. Suppose that P(A) = 
Р(А B) = 0.7. Let P(B) =p. 


1) For what choice of p are А and В mutually exclusive? 


ii) For what choice of p аге А and В independent? (Р.О, М.А, 
c) Given P{A)=0.5 and P(AU В) = 0.6. find P(B) if 
i) Aand B are mutually exclusive. Ч) A and B are independent 


iii) P(A/B) = 0.4 


PROBABILITY 
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Indicate whether cach of the following statements is true or false. If false, indicate why. 
a) IfP(A/B) = 0, then A and В are mutually exclusive. 

bj). If P(4/B) = 0, then А and В are independent. 

с) IfP(A/B) = P(B/A), then P(A) = P(B). - 

d) IfA and В are independent, then P(A) = P(B). 


a) State and prove the multiplication law of probabilities for independent and dependent 
events. 


1 
b) Let4 and В be two independent events such that the probability is — that they will occur 
е 8 


3 
sultaneously and 4 that neither of them will occur. Find Р(4) and Р(В). 
3 (B.Z.U. & P.U., B.A/B.Sc. 1976) 


Two dice are cast: E, is the event that a6 appears on at least die, E, is the event that a 5 

appears on exactly one die and E , is the event that same nuber appears on both dice. 

i) АгеЕ, and E, independent? eV 

ii) АгеЕ, and E, independent? "i 

ін) Are E, and £, independent? (P.U., B.A/B.Sc. 1980) 

a) А can solve 75% of the probl SS this book and B can solve 7096. What is the 
probability that either A or B c Іуе a problem chosen at random? 


b) Three cards are drawn inySüwcession without replacement from an ordinary deck of 
playing cards, Find the ility that the first card is a red ace, the second card is a ten 
Өг jack, and the third is greater the 3 but less than 7. 


The probability that Qi be alive after 10 years to come is 5/7 and for B it is 7/9. Find out 
the probability tha! oth of them will die, (ii) A will be alive and B dead, (iii) B will be 
alive and A dead, (iv) both of them will be alive, in 10 years to come. 


a) A bag contains 3 red and 5 black balls and another dygd and 7 black balls. А ball is drawn 
from a bag selected at random. Find the probability Yit is red. 


5) Опе urn contains 3 white and 2 black balls, another contains 5 white and 3 black balls. If 
an um is chosen at тандо and a ball is taken from it, what is the рі” »ability that it is 
white? 


Two drawings each of three balls are made from a bag containing 5 white and 8 black balls; 
the balls are not being replaced before the next trial. What are the probabilities that the first 
drawing will give 3 white balls and the second 3 black balls? 


2) Show that the multiplication law P(A ^8) = P(A/B) P(B), established for two events, 
may be generalized to three events as follows; 
РАВ AC) = Р(АЛЗ AC) Р(В/С) P(C) 


6.49 


6.50 


6.52 
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b) A farmer has a box contdining 30 eggs, 5 of which have blood spots. He checks 
taking them at random one after another from the рох. What is the probability, that 
first two eggs have spots and the third will be clear? 


Three groups of children contain respectively 3 girls and | boy; 2 girls and 2 boys; 1 girl а 
boys. One child is selected at random from each group. Show that the probability that the 
selected consists of 1 girl and 2 boys is 13/32. 


There are three families, each having four children; 2 boys and 2 girls; 3 boys and 1 girl; 
boy and 3 girls. A child from each family is invited to з party. Find the probability (i) 
only girls turn up for the party, (ii) that two girls and one boy turn up for the party. , 


e^ 
The odds that a book will be favourably reviewed by three independent critics are 3 to 
3 and 2 to 3 ау What is the probability thi of three reviews a majority 
favourable? 


Hint: If we are given the odds that an event 4 will occur, as a to 8, 
а Ь 


4--- 


a+b a+b ҚМ 


a) A can hit a target four limes in 5 shots; В thr mM in 4 shots; C twice in 3 
fire a volley. What is the probability that rweghots at least bit? 


b) А committee of three —A, В; and C, QS make a decision on the basis of a 
vote. What is the probability of a ision by the committee if the proba! 
wrong decision by each member «cs А) = 0.05, P(B) = 0.05, and P(C) = 0.10? 

Aye (P.U., BAB 
1 


1 1 
eee a ea and zi Each : 


p= 


the target. 
i) Find the т exactly one of them hits the target. 


п) Ifonlyo target, what is the probability that it was the first man? 
(P.U., B 


Three missiles are fired at a target. If the жыра of hitting the target are 0.4. 0 
respectively, and if the missiles are fired independently. what is the probability? 


i) that all the missiles hit the target? 

ii) that at leas one of the three hits the target? 
iii) that exactly one hits the target? 

iv) that exactly 2 hit the target? 


А and B play 12 games of chess, of which б are won by А, 4 are won by B, and 2 
They agree to play a tournament consisting of 3 games, Find the probability that 
three games, (b) two games end т in a tie, (c) A and В win alternately, (d) В wins 
game. (РОВ 
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The contents of two urns аге as follows: 


Um А contains 3 red and 2 white balls. 
Urn B contains 2 red and 5 white balls. 
An urn is selected at random; a ball is drawn and put into the other urn; then a ball is drawn 
from the second urn. Find the probability that both balls drawn are of the same colour. 
Hint: Construct the tree diagram. 
A man ínvited 5 friends. He was born in April as also ail the invited friends. What is the 


"probability that none of the friends was born on the same day of the month as the host. 
" (C.S.S., 1962) 


a) What are respective chances of winning of A and В who toss a coin alternately on the 
understanding that the first to obtain heads, wins the toss? 


b) Three men toss in succession for a prize (о be given to the one'who first obtains heads. 
Show that their respective chances of winning are 4/7, 2/7 and 1/7. 


a) Find the probability of getting exactly 4 heads when 6 are tossed. 
2 (В.2.12., В.А /В.5с. 1976) 


b) Sixteen coins are tossed once. What is the и ну of obtaining (1) exactly 8 heads, 
(ii) exactly 11 heads? қ 


The national pass rate for ап examination A school enters 6 candidates, Calculate the 
probability that (i) 2 candidates will (ii) 5 candidates will pass Explain why the 
probability of all passing is not x е probability of all failing. 


з) State and prove Bayes’ theo 


b) Three urns of the s arance һауе the following proportions of white and black 
halls. ~ 
Um 4: Дене, 2 black balls. 
Um ite, 1 black ball. 
v 2 white, 2 black balls. 


One of the urns is selected and a ball is drawn from it. It tums out to be white, What is the 
probability that urn C was chosen? (P.U., B.A., (Hons.) - Part ІП, 1965, В.А/В.5с. 2007) 


There are three coins, identical in appearance, one of which is ideal and the other two biased 
2 

with probabilities E and * respectively for à head. One coin is taken at random and tossed 

twice. If a head appears both the times, what is the probability that the ideal coin was chosen? 


In a certain college, 4% of the men and 1% of the women are taller than 6 feet. Further more, 
6096 of the students are women. Now if a student is selected at random and is taller than 6 
feet, what is the probability that the student is a woman? (P.U., B.A. Hons. 1974) 


Three cooks 4, B and C, bake a special kind of cake, and with respective probabilities 0.02, 
0.03 and 0.05 it fails to rise. In the restaurant where they work, А bakes 50 percent of these 
cakes, B 30 percent, and С 20 percent, What proportions of "fartures" is caused by 4? 
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The stock of a warehouse consists of boxes of high; medium and low quality lightbulbs ж 
respective proportions 1:2:2. The probabilities of bulbs of three types being unsatisfactory 
0.0, 0.1 and 0.2 respectively. If a box is chosen at random and two bulbs in it are tested 
found to be satisfactory, what is the probability that it contains bulbs (1) of high quality, (ii) 
medium quality, (iii) of low quality? 


A patient is thought to have one ja three diseases 4, 4, and A, whose probabilities under 


given conditions are > i and — ; respectively. А test is carried out to help the diagnosis 


yields a positive result with a салыну of 0.1 for disease A,, a probability of 0.2 for di 
А, and a probability of 0.9 for disease A,. The test is conducted 5 times and the results 
positive 4 times and negative once. What is the probability of each disease after testing? 
EY (P.U., B.A. (Hons.) — Рап-Ш, 1 

t 
.......... 
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RANDOM VARIABLES 


ODUCTION 


Usually, we are not interested in a particular outcome of a random experiment but our interest 
% some numerical description of the outcome. For example, when two coins are tossed, we may 
only in the number of heads which appear and not in the actual sequence of heads and tails 
5 not a numerical quantity. To express the outcomes in numbers, we assign to each non-numerical 
of the sample space S = (HH, HT, TH, TT), one of the numbers i (i = 0, 1, 2) corresponding to 
of heads appearing. That is, we express the outcomes in terms of numerical values as 


Domain or Sample Space (Ej): (HH) (HT) (TETT) 
Bange or Corresponding Value X = ДЕ): Bind A 499 


Again in the experiment of throwing a pair of dice, if we are interested only in the sum of the dots 
‘ 

sper faces of the two dice and not in the particular dots, we assign to each possible outcome of 

t, one of numbers i (i = 2, 3, 4, ..., 12) corresponding to the sum of dots appearing on their 


з дейш the numbers 0, 1, 2 and 2, 3, 4, ..., 12 in a Move cited examples, are random 
determined by the outcomes of the random i . Such a numerical quantity whose 
з determined by the outcome of a random is called a random variable. 
scally, we assign a single real number to each of the sample space, and hence we state 
variable is a real-valued function d Pon a sample space. Thus the number of heads 
п tossing of two coins and the sum pe dots obtained with a pair of dice in the above 
аге the values of random variables. © 


spuid be noted that іп the ition, a function has been named a random variable. This 
, though inappropriate t unfortunate, is universally accepted and used. 


xndom variable is al a chance variable, a stochastic variable or simply a variate and is 
as гу. The rando@y iables are usually denoted by capital Latin letters such as Ж, Y, Z; 
values taken by them are represented by the corresponding small letters such as x, у, 2. It is to 
that more than one r.v. can be defined on the same sample space. There are two types of 
screte and the continuous. 


UTION FUNCTION 


distribution function of a random variable X, denoted by F(x), is defined by F(x) = Р(Х<х). 
F(x) gives the probability of the event that X takes a value Jess than or equal to a specified 
distribution function is abbreviated to d.f. and is also called the cumulative distribution (cdf) 
cumulative probability function of the X from the smallest upto specific values of x. 


F(x) is a probability, itis quite obvious that 
F(-») = Р(ф) = 0 апа Е(+®)= Р($)=1. 


227 
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Let a and b be two real numbers such that a < b. Then the probability of the interval (2. 
F(b) - F(a) = Р(Х < b) -F(X < a) 
= P(a « X Sb), 
which is non-negative and hence Ғ(х) is a non-decreasing function of x. 
Again Lim F(x +h) = Р(х), i.e. the function F(x) is continuous on the right at each 


A df. F(x) thus has the following properties: 
) F(-«)=0,F(+0)=1. 
ii) F(x) is а non-decreasing function of x, i.e. F(xi) S Р(х) іх, S xs. 
iii) F(x) is continuous at least on the right of each x. 
АП rándom variables have distribution functions. Distribution functions for 
distinguished by using the notation F,, F, etc. 
7.3 DISCRETE RANDOM VARIABLE AND ITS Ay DISTRIB 


Y A random variable X is defined to be discrete if it og)'Sssume values which are finite 
infinite. When X takes on a finite number of valu Rey may be listed as xi, Xo, x 
countably infinite case, the values may be listed 2. Хз, os, Хән. The number of 
coin tossing experiments, the number of defect items observed in a consignment, the 
accidents, the number of bacteria in 1сс of ya etc. are the examples of discrete r.v. 

Lat X be a discrete r.v. taking o nct values хі, x», ..., Xm ... Then the function, 
or f(x), and defined by Y 
" 
fix) = Р(Х = x) К -1,2,...,,2. 
-0, Sior x* x; 

is called the probabili ction (pf) of the г.у, X, and the values x;, Xz ...‚ x, .. among 
probability 1 is distri are called the probability points or jump points, Р(Х = x) 
probability that the discrete r.v. X takes the value x,. 


The distribution function for a discrete r.v. is 
ғо) = Y fx; ). 


where sum is taken over all x; that are less than or equajbto x. 
Letting x tend to + o ме have Р(+ю) = У f(x,)=1. 
i 
F(x) in'case of a discrete rv. is a step function. That is, its graph consists of 


segments between any two successive values and has a step or jump of height f(x;), af each 
figure on page 229). It should be noted that F(x) is continuous but between jumps, it is c 
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0.5 


0 x 


v X1 X2 X, = Xj 


A discrete т.у. may also be defined as a r.v. whose d.f. jumps at the possible values of X and 
constant between adjacent jump points. The height of a jump at each point x is the probability of 
X=x, © 

Ло) = [jump at xj] = Р(Х = ху). 9 

The set whose elements are the ordered pairs [x, Лх), д 1, 2, ... defines the probability 


. Some writers do not make any distinction the terms probability function and 

ity distribution but they use them interchangeabl: probability distribution of a r.v. may 
ly be expressed either in a tabular form wing all the possible values of X and the 
probabilities (xj) = Р(Х = x;) as зу 


[me —  w[S x - x -] 


Probability ` Дх) Rx) Дх») F ау! 2 


form of an equation for Ax) vit list of the possible values of X. 


graph of a probability боп is obtained by locating the values хі, x, ..., ху, ... along the 
axis and drawing Уса! lines of heights equa! to flx), Ax), .... Ах), .. above them А 
distribution can also be graphically displayed by a probability histogram. 
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A probability distribution must satisfy the following two basic properties of probability: 
i fix) 2 0, for all i. : 
ii) ife )=1. 


In other words, the probability of an outcome (x,) is greater than or equal to zero and ће 
probabilities associated with all possible outcomes must be unity. 


Example 7.1 Find the probability distribution and distribution function for the number 
when 3 balanced coins are tossed. Construct a probability histogram and a graph of the distribution. 


The equiprobable sample space for this experiment is 
5 = (HHH, HHT, HTH, THH, HTT, THT, ТТН, TTT}. 
Let X be the r.v. that denotes the number of heads. Then the values of x are 0, 1, 2 and 3, 
probabilities are: i 
` А EN 
Ло FX a= PIT =, SS 
f() = P(X = 1) = PL(HTT,THT, тни-3 a 
f(2)= Р(Х 42) = P[{HAT, HTH, d 


JG) = Р(Х 23) = P[{HHH}) = i 


o> 


Putting this information in the tabgjgt)form, we obtain the desired probability distribution 


To obtain а fornfids, we need expressions for x heads to be selected out of 3 heads 
denominators for all values. Now, x heads can be selected in (4 ways and the total number 
is 8, Therefore the desired probability distribution in the form of an equation is 


() 
3 
f) Pac «s E C3) for х-0,1,2,3. 


For. distribution function, we compute the probabilities as below: 
If x <0, we have Р(Х <x) = 0 


If 0S x< I, we have P(X <x) = P(X = 0) = 2. 
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For 1S х<2, уе һауе , 
Р(Х «3)- P(X =0)+ P(X =) 1+ 


231 


ік 
oo |. 


Similarly, for 2 S x <3, we have 


РОХ <x) e Р(Х =0)+ Р(Х 2) Р(Х 2) 6 = 
Finally for x2 3, we have 
3 
Р(Х <х)= Y Р(Х =й) =1. 
ind 
Hence the desired distribution function is 
0, forx<0 
P for 0S x «1 
Е(х) = Е ‚ forlsx<2 v 
ы 25х<3 a 
8" Q) 
Da fs 


The probability histogram is obtained by нош points (х, /(х)), while the graph of distribution 
is obtained by plotting the points [x, бы wn below: 
fon 


9 F(x) 
Ў : 
6/8 
4/8 
2/8 
х X 


Use the probability distribution to find the probabilities of obtaining (i) a sum of 8 or 11, 
that is greater than 8, (iii) a sum that is greater than 5 but less than or equal to 10. 


The sample space S for the experiment of throwing two dice contains 36 sample points, which 
likely, ie. each point has probability = - 4 


https://stat9943.blogspot.com 


232 INTRODUCTION TO STATISTICAL TI 


Let X be the random variable representing the sum of dots which appear on the dice. 
values of the r.v. are 2, 3, 4, ..., 12. The probabilities of these values are computed as below; 


fQ)- Pc -2)- Pad] - x. as there is only one point resulting in a sum of 2, 
f) РОХ =зу= P2) 39] 3. 
/@)= РХ =4)= PG 2:2) = 
бішініу, fG)e fe =<, (==, o. 
fQ0)- 3. f) 2. and f(2) =. 
Therefore the desired probability distribution of the r.v. X is 


E anis ұлы ИЗ 
36 36 


f= пле #6 


(b) Using the probability distribution, i required probabilities as follows: 
i) P(asum of 8 or 11) = =8) or (Х=11)] 
К, X=8) + P(X=11) 
= кзз 22-7; 
N f(8) /(11) x 


36 
ii) Ра sum that асг than 8) 
»8) 
= P(X-9) + P(X=10) + P(X=11) + Р(Х=12) 
= Д9) + 10) “Л11) “Д12) 
412.01 


"35 36536 36 36 
iii) P(a sum that is gresier than 5 but less than or equal to 10) 
= P(5 « X € 10) 
= Р(Х=б) + PIX=7) + P(X=8) + Р(Х-9) + P(X=10) 
7 f(6) + Д7) + 8) + Д9) + 10). 
SEM ЕМЕ АЙЕ) A 


"36" 36* 36 36 36 36 


CONTINUOUS RANDOM VARIABLE AND 
ITS PROBABILITY DENSITY FUNCTION 


А random variable X is defined to be continuous if it can assume every possible value in an interval 
2< b, where a and b may be -œ and +% respectively. The height of a person, the temperature at · 
the amount of rainfall, time to failure for an electronic system, etc. are examples of continuous 
variable. 


А r.v. X may also be defined as continuous if its d.f. F(x) is continuous and is differentiable 
except at isolated points in the given range. The graph of F(x) has no jumps or steps but is a 
function for all x. F(x) 


Let the derivative of F(x) be denoted by fx), i.e. 1 


x =f) 


Since F(x) is a non-decreasing function of x, we have 
3 70)20, 


Э Ғ(ху- fre dx , for all x. 


The function f(x) is called the probability density fu abbreviated to p.d.f., or simply density 
of the r.v. X. 


A p.d.f. has the following properties: Ne y 
з /(д>0, forallx D 


fre dx=1 


The probability that on а value in the 
interval (с, d], c < by 


P(c«xxd)- F(d)- F(c) 
- 4 € 
= [ода [лок 
` > 7 -= 
= [одах 
te area under the curve у = f(x) between X = c and X = d. 


gure above). 


other words, f(x) is a non-negative function, the integration takes place over all possible values 
+. X between the specified limits and the probabilities are given by appropriate areas under the 
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Also Р(х < X < х+ах) = F(xdx)- F(x) 
= f(x)dx 
The quantity /{х)ах is called the probability differential or probability element (p.e.) of X. 


k 
Since P(X =k) = [/(х)йх=0, 
k 


it should therefore the noted that the probability of a continuous r.v. X taking any 
is always zero. That is why probability for a continuous r.v. is measurable only over a given i 


Further, since for a continuous r.v. X, Р(Х-х) = 0 for every x, the following four 
regarded the same: ^ 
P(e $ X Sd), P(c < X <d), P(c S X «d) andP(c < X <d). 
They may be different with a discrete r.v. 


The values (expressed as intervals) of a continuous r.v. and(flleir associated probabilits 
shown either in a tabular form or expressed by means of a fc 


Xs 
Example 7.3 (a) Find the value of k so that the f(x) defined as follows, may be 


function d 
` 9 
Хд-ію, 0<х<2 - 


70, elsewhere ты, : 
(b) Find also the probability ta oj of two sample values will exceed 1. 
(c) Compute the distribution боо F(x). 


a) во ap aa ab 


i) f(x)20 for “абы 


й [f(x)de=1 
The first condition is satisfied when 20. The second condition will be satisfied, if 


0 2 = 
ieif — 1e [лоас |ж» |х 
— 9 2 


Le. if l= Jod Јев fode 
= 0 2 
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2 
ie. if кы +0=2k 
2% 


x 

т, юг0<х<2 
Hence /(х)-42 Dr 

О, elsewhere \ 


Р(Х >1) = areas of shaded region 


2 a 
- pou 


P(two sample values exceeding one) -3х3- 2 


К compute the diytribuliod Éaiction; we lind SS 

F(x) = P(X <x)= pos a» 
a 

х such that - eus 


еме Fa) SI Jeje- 


for x > 2, we have F(x)= [oe [m [7m 
-% 0 2 


(х) -0, for x «0 
х? 
ED )ғ0<х<2 
-1, for x>2 


7.4 A r.v. X is of continuous type with p.d.f. 
Дх) = 2х, 0<х<1, 
= 0, elsewhere. 
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1,4 2 
Xs sa К 
өң 2135 2) 


© t 
Сігапу/х)>20 апі | f(x) dx = [2x dx =1. 
-0 0 
i) Since f(x) is a continuous probability function, therefore 
ES ST 
{х-3)-о 


(xs 5) fo des fzr a= O+[x’ 1-1 


=: 
= 


iii) LEE Jr de+ fo as - [х2 оя 


1/4 
1/2 NT Ф 
1=х<1)- [охар ] 288 
А 4 2) аз v e 
v) Applying the definition of vaginam e 


1/2 


= 


Pix | s x <a 2e 
S e Ң1зх<2) fox dx 
© * 13 
5% =[х* Јерә? ] 
$0 9:45 
“ш---Х--ш-- 
36.735 12 
-Example 7.5 A continuous r.v. X has the d.f. F(x) as follows: 
F(X) -0, for x « 0. 
2 
m forü«x < 1, 
1,22 ж 
34-2) f fori<x< 2, 
ея], 0. for x>2. 


Find the p.d. and P(|X| < 1.5). 


INTRODUCTION TO STATISTICAL 


Find (i) ңх-3), (ii) ңх=3), (ш) Ax>t), iv) ДЕ? 
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By definition, we have Дх) = or (х). 


Therefore Хә zx їог0< х<1 


=т@-х) forl<x <2.. 
=0 elsewhere. 


Au < 1.5) = P(-1.5<X < 1.5) у, 
-15 0 1 15 
[оа fo dcs [PE 6-9 
-ю A5. 0 5 ' 1 5 


21! 2 


Же о» 


= 0.40 + 0.35 = 0,75. «Ў 


DISTRIBUTIONS D 


distribution of two or more variables which are observed simultaneously when an 
is performed, is called еі, Я? distribution. It is customary to call the distribution of a 
-as univariate. Likewise, tribution involving two, three or many r.v.'s simultaneously is 
%ж as bivariate, trivaria й Ite. 


Bivariate Distribution Function. Let X and Y be two r.v.'s defined on the same sample 
Then the function F(x, y) defined by F(x, у) = Р(Х S x and Y S y), where F(x, y) gives the 
that X will take on a value Jess than or equal to x and, at the same time, Y will take on a value 
эг equal to y, is called a bivariate or joint distribution function of X and Y. 


tevaraite d.f. F(x, y) possesses the following properties: 

F(x, –00) = F(-, y) = 0, F(+00, +оо) =1 

Fix, y) is а non-decreasing function of x and y, and is continuous on the right. 

Ех, € x; and y, < ys, then 

Pix; SX <x у,<Ү<у) = P(X&xy, Y5y:)-P(X «x, Ү<у)) — Р(Х<х\„ Ү<уз) + Р(Х<х, YS) 
=Р(ху, ys)- Fs. у) y) FG у) > 0 


The probability that a random point (X, Y) 
falls in the interval (x, S X < x5; y, SY < y;) 
is shown graphically. у, 

А bivariate distribution may be discrete 
when the possible values of (X, Y) are finite or 
countably infinite. It is continuous if (X, Y) can y 
assume all values in some non-countable set of 
the plane. A bivariate distribution is said mixed 
when one r.v. is discrete and the other is 
кене х 

7.5.2 Bivariate Probability Function. Lot X and Ү be two discrete t.. 's defined on 
sample space 5, X taking the values xı, x; .., x, and Y taking the values yy y» -s Ye 
нра dat Chios бетін 2 anl wu (e Shain tino, Y Bus om the velas у, DEON 
ог ру, is defined to be the joint probability function or simply the joint distribution of X and Y. 
joint probability function, also called the bivariate probability function f(x, y) is a function w! 
the point (x, yj) is given by 

Лх. y) = Р(Х = x, and Y = yj, i21,2,., 08 
fia E ud 

The joint or bivariate probability distribution pairs of values (x, yj) and their 
ttt UIN MALIS either be shown in a two-way table 


ЕС hy) 


or be expressed by means of a formula for Дх, у). The probabilities f(x, у) can be obtained by 
appropriate values of x and y in the table or formula. 


A joint probability function bas the following properties; 
i) Axa yi) >, for all (x, y). ie. for? =1, 2, ..., m; j = 1,2, ..., 7. 


i) »»fG.»)2 
tj 


7.53 Marginal Probability Functions. From the joint probability function for (X- 
obtain the individual probability function of X and Y. Such individual probability i 
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Let f(x, у) be the joint probability function of two discrete r.v.'s X and Y. Then the marginal 
ility function of X is defined as 


a 
gx) => S Gy) 
ү 
= Дх, yi) + Дхь Y2) + ... +, Yn) as x; must occur either with y; or y; ог... or y,. 
= P(X =x); 
the individual probability function of X is found by adding over the rows of the two-way table. 
Similarly, the marginal probability function for Y is obtained by adding over the column as 
m 
му) = У f Gl yj) Р(Ү = у) 
гі 


The values of marginal probabilities are often written in the margins of the joint table as they are 
and column totals in the table. The probabilities in each a probability function add to 1. 


7.5.4 Conditional Probability Functions. Let X and Y be two r.v.'s with joint probability 
Ах, y). Then the conditional probability function for X vii denoted as Дхіу), is defined by 


Јоу) = Р(Х=х | Ү-у) 


_ P(X =x; and Y — y;) "il 


P(Y = yj) Қаз 
DEM for i=1 ор ia. 
Wy) ES) 
Жу) is the marginal probability ) 7 0. It gives the probability that X takes on the values x, 


Y has taken on the values PN е conditional probability f(x; | y) is non-negative and (for a 
yj) adds to 1 on i and hi i$ a probability function. 


"Samilarly, the condition bility function for Y given X = x is- 
foi] x)= Р(Ү=у, | X72) 
_ PU - y; and X =з) 
Р(Х =x;) 
2 f(x; уу) 
5 g(x;) à 
Independence. Two discrete r.v.'s X and Y are said to be statistically independent, iani 


all possible pairs of values (x; yj) the joint probability function f(x, y) сап be expressed as the 
the two marginal probability functions. That is, X and Y are independent, if 


Дх, y) = Р(Х-х) and Ү-у) 


-Р(Х-х)). Р(Үту) for all i and j. 
7 g(x) Му). 


4 where g(x) > 0, 


It should be noted that the joint p.f. of X and Y when they are independent, сап be ob 
multiplying together their marginal probability functions. - 

Example 7.6 An шт contains 3 black, 2 red and 3 green balls and 2 balls are selected at 
from it. If X is the number of black balls and Y is the number of red balls selected, then find 


i) the joint probability function f(x, уус 
i) АХ+У51); 

iii) the marginal p.d. g(x) and /(у); 

iv) the conditional p.d. f(x | 1), 

v) P(X-0|Y = 1), and 

vi) Are X and Y independent? 

i) The sample space 5 for this experiment contains (аа sample points. The possib 
of X are 0, 1 and 2, and those for Y are 0, 1 and 2. The values that (X, Y) can take on 
(0, 1), (1, 0), (1, 1), (0, 2) and (2, 0). We desire to find f(x) for cach value (x, у). | 
Now Д0, 0) = Р(Х-0 and Ү>0), where the event (X: ¥=0) represents that neiti 
passin rt i Or a жа are i 


omes 
40,0) = -— 


Again ХО, 1) "пару 


бетте 


(Сус 
Similarly, Yu, 1)= E а 
= P(1 is black, 1 is red and none is green) 


3y2Y3 
ШЫ) 
T B. ЗЕ 
Similer calculations give the probabilities of other values and the joint p.f. of X and Y is ge 


ROR eee er ПЫШ 
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These probabilities can also be represented in another tabular form as follows: 
Joint Probability Distribution 


ТЫ, мыл 


this joint p.d. of the two r.v.'s (X, Y) сап be represented by the formula. 

| не! з | х=01,2 

тар эё) ал е 
28 © 

0<х+у<2%” 

То сотрше Р(Х + Ү < 1), wesc tat + у Mr өе cl (0, 0), C, 1) and (1, 0). 

Therefore 

P(X Y € 1) =A Уз ash. > 


A 


The marginal p.d.'s are қ» 
ЕЖЕ ют Еа 
[мә | asas | тув | vas | 


By definition the conditional p.d. f(x | 1) is 


ХЖх|1) = Р(Х=х | Ү-1) 
_ Р(Х =хапйҮ =1) f(x) 
2 “БҰ” Қ) 


6 12 3 
LU Улоо ++ 80-725 


хо)” 
ғ) = ^M) iren, х-0,1,2 


f(D- ifo» 9 ВЕБ -1 
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7 тү 6 1 
fai) ==.) = БЕ = 


P Rd 7 
7010-70.) -(3o-: 


Hence the conditional p.d. of X given the Y = 1, is 
Axi) | n 142 0 | 


v) Finally, Р(Х=0 | Y=1) -/0| 9-1 
vi) Ме find that (0, 0- ж. 


1 10 
г(0)- L/e»- SL 
2%” 28 28 28 
S 


eS 
ку» Х/ол-% еба 


28 28 S^ 


Now Жаа 
28 28 28 
ie. ДО, 1) = g(0) К(1), Қой 
and therefore X and Y are not statistical: 


Example 7.7 тезда од Sea. Хава Y is given by 


ЖЗ for x = 1,2, 3 and y = 1,2. 


Are X and rapi 
The marginal p.d. for X is 


є(х)= Y f(x,y) 
У „4 
2 2 2 2 
=F 250) ‚лу x = 7 
2% NOE. КЕЗ; 


and the marginal p.d. for Y is = 
Ay) => f(xy) 


varastes https://stat9943.blogspot.com - zs 


2 


S RAP. for x = 1,2, 3 and y= 1, 2, 


Г 30 
те fix, y) = (х). Му) 
Y and Y are independent, 


756 Continuous Bivariate Distributions. The bivariate probability density function of 
r.v.'s X and Y is an integrable function f(x, y) satisfying the following properties: 


fs y) > 0 for all (x, y). 


| J /®у&& +1, and 


Pa X sbesYsd)- | Tree dx. 


ae 


distribution function (d.f) of the bivariate r.v. (X, Y) is defined by 


F(x, y)= | | /(и,у)ау du. © 
ж. тж N j 
should be noted that analogous to the gifs к (x)= f(x), ме have 
SS 
) * 


—= f(x, y), wherever F is differentiable. c 


marginal p.d.f. of the continuous r.v. хў 
к 
к(х)= [ f) dy 
ШІ 
Фе r.v. Yis S 


iy) = 1 ж 4. 


ss the marginal p.d.f. of any of the variables is obtained by integrating out the Ste variable 
p.d.f. between the limits — 00 and + оо. 


eenditional p.d.f. of the continuous г.у. X given that ) takes the value y, is defined to be . 


t Хоу) 
/(х|у) ^N) ^ 


and (y) are respectively the joint p.d.f, of X and Y, and the marginal p.d.f. of Y and A(y)>0. 
i the conditional p.d.f. of the continuous r.v. Y given that X =x, is 


fot) » 2000, provided that g(x) > 0. 
g(x) 
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It is жен noting that the conditional ds satisfy all the requirements fo: a uni: 
function. 


Finally, two continuous r.v.'s А and Y are said to be statistically independent, if and 
joint density f(x, у) can be factored in the form f(x, у) = g(x), ^ (v) for all possible values of X 


Example 7.8 Given the tollowing joint p.d.f. 
Лку)-16-х- у),0<х<2;2<у<54, 


-0, elsewhere. 
a) Verify that f(x, y) îs a joint density function. 


b) Calculate (1) Axs2.) Үз- 5) Gi) P(X + Y <3). 


c) Find the marginal p.d.f. g(x) and A(y). 
d) Find the conditional p.d.f. fx | y) and fly ! x). © 
o 


а) The joint density f(x, у) will be a p.d.f if [o 
) ху) 2 Oand eU 
о © RX 
i) ff [Лх»жау=1. 49% 
- Now f(x, y) is clearly 2 0 for all Je in the given region, and 
o © 3 1 24 . 
| [rests al [®-‹-»у® 
2 27 
SS oe ad а * 
AN 8; у-ху Е 


1 447%; 
--((6-2х4--Ц6х-х21 
balsa жол) 
1 
= Ї?-4]=1. 

Thus f(x. y) has the properties of a joint p.d.f. 


b) (i) To determine the probability of a value of the г.у. (X, Y; falling im the 
we find 


3-x 


Е Ir (x, yy, 


Кы. vanagreshittps://stat9943.blogspot.com 


% 
(ii) Р(Х +Y <3) -tf (6-x- ydy dr(-x*y $3, y 23x) 
9 


-о<х<ш 


0<-<7” 


0<х<2. 


05х52, 


x«Üorx2 2. 


Event “Y+ Y S 


245 
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Similarly, the marginal p.d.f. of Y is 


к aS 
ку)=+ [(6-х-У)®, 2<у<4 
0 
1 
---(5-у) 25у<4 
4 
=0, elsewhere. 


‘ 
8) The conditional p.d.f. of X given Y = y, is 


Лау) ead where hó) > 0. 


_ (/8)(6—х-у)_6-х-у 
^ (/4)(5-))  X5-» SS 


SU 


and the conditional p.d.f. of Y given X = x, is 


fo ff E 5 , where gx ы” 


pes zx oe -х-у 
Су 23-х) ' 


Example 7.9 Let the A random variable (X, Y) have the joint 
function given by 


aa +=, 05531, 0<у<2, 
Ns elsewhere. 
а) Check that f(x, y) is a p.d.f. 
b) Find the marginal p.d.f.'s. 
с) Find the conditional p.d.f.'s and verify that /(х| y) is a p.d.f. 
a) The function f(x, y) will be a p.d.f. if 
i f(x у) 2 0 and 


T freta. 


жо ж 


Insee {eZee 
00 


то =o 


2 
22x42 5-2 143308 x1 қ” 
6 3 3 N) 


- S 


9 
iy 2a 4% 
bs 
sits 109 0<у<2 
д? 


2 
2 
Б; ыы Жз өзө „ы, гуай 
h(y) 50%») 2%у А 


itional p.d.f. of Y for given X = x is 
PA 
2-7»). 3 Зх? 5 ху 722) 


: » =" 0<у52,0<х51. | 
a(x) 20433) бх? +2х бх+2 
3 


ify that the conditional p.d.f. Ах/у) is a p.d.£., we have 
1 


2 
id *2x| х, 1 (2х2 + 22у) 
у 2+y 


0 
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=—1(2+y)=1, for all y. 
2+у 
Hence the conditional p.d.f. f(x | y) satisfies the requirements for a univariate density 


7.6 MATHEMATICAL EXPECTATION OF A RANDOM VARIABLE 
Let a discrete r.v. X have possible values ху, x2, ..... X, ... With corresponding 

оз). fix). -~ such that $^ f(x) =1. Then the mathematical expectation or the expectati 

expected value of X, denoted by Е(Х), is defined as 
AR =Й) + efx) +. + 29a.) +... 


2 
= Ух JÙ; ), provided 1 converges absolutely 
ssi 
The sum converges absolutely if and only if X | х f(x) 1s a 


When X takes on only a finite number of val OY have E(X) = У, fini) 
ізі 


regarded as a weighted mean ої ihe variables рабе Xh Xz s X. таса Ьер 

respective probability. In case Фе values ж Фаһу likely, EX). Xxj. which 
a 

ordinary arithmetic mean of the л poss: шев. Е(Х) is also called the mean of X and 

denoted by the letter ш. t should be wl that A(X) is the average value of the r.v. X overa 

number of trials x 


if the rv, X is сой о.б. (x), then 


Es. [xreo а the integra} converges absolutely, ie. (ix Lf) di is 


-т -s 


Clearly the definition of mathematical expectation in the case of a continuous r.v. is 
same with summation being replaced by integral d 

Example 7.10 (а) What is the mathematical expectation of the nuraber of heads when. 
are tossed? 

(b) What is the expectation of the number of failures preceding the first success in an 


of independent trials with constant probability of success? (P. 
2) Let the г.у. X represert the number of heads when three fair coins are tossed. 
following p.d. 
КЕТЕІРЕЕНЕЕ 
s^ = 
Мары ы 
$ ^ 4 ! —A 3 
Д ЕЯ E | s. 
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the mathematical expectation of X is 
1 3 3 12. 
Е(Х)-0х-жі1х-%2х--%3х-----15 
8 8 8 8 8 


should be noted that £(X) is 1.5, which is not an integer and not any value X could actually have. 
means that, if the experiment of tossing 3 coins is pee a very large number of times, we 
ton the average to get 1.5 heads. 


) Let the r.v. X denote the number of failures preceding the first success. Then as Л takes the 
value 0, 1, 2, 3, ..., the respective probabilities are p. gp, Фр, Фр, ... where q = 1- p. 


Hence AX) = x fa) + хх) + хз) +... 
-Оржілр%24р + Зарр + 
= gp (1 +29 + 3d +...) 
-qp(l - q)? - ap(p)* 23 
le 7.11 (a) If it rains, ап umbrella salesman can earn $30 реду. If itis fair, һе can lose $6 
What is his expectation if the probability of rain is 0.3? 44) (P.U., В.А./В.5с. 1982) 


A man draws 2 balls from a bag containing 3 white and black balls. If he receives Rs. 70 for 
ball he draws and Rs. 7 for every black ball, find Rig Bipectation. (P.U., B.A./B.Sc, 1987). 


Let X represent thé number of dollars the sales*NQWearns. Then X 15 a г.у. with possible values 
30 and —6, wheré,—6 corresponds to 1ћем 91° that salesman loses, and the corresponding 
probabilities are 0.3 and 0.7 es 


Hence E(X) =30 x 0.3 + (-6) x0. ay 
= $ (9.00 — 4.20) A$ 80 per day. 


(жо balls from а bag conl 3 white and 5 black balls can be drawn in the following three 
mutually exclusive ways 


» 2 white balls. 

М I white and 1 black ball, 

* 2 black balls. я 
Let p denote the probability of drawing 2 balls. 


ЕТЕ 
CGS 
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Let X denote the amount to be received. Then 
xı =2 X Rs. 70 + 0 X Rs. 7 = Rs. 140, 
х. = | X Rs. 70* | X Rs. 7 = Rs. 77, 
xi -Ux Rs. 70 * 2x Rs. 7 = Rs. 14, 


Hence the required expectation = хүр, + хур; + хурз 


2-551493. 1577» La 


28 28 28 
= 15 +41.25 + 5 = Rs. 61.25. 
Example 7.12 Find the expected value of the r.v. X having the p.d.f. 
x) »2(1-x) 0<х<1 


= 0. elsewhere, 
Now E(X) = [А f(x) dx S 
5 S 
220-24 42 


7.61 Expectation of a Е de of a Random variable. Let H(X) be a function of 
Then //(X) is also а г.у. and also, an expected value, as any function of a r.v. is also a г. 
discrete r.v. with p.d. f) бейне UO takes the value Н(х) when X = xi, the expected 
function H(X} is қы е 
E[H(X)] = Н) хі) + Но) ха) + ... + An) Ах.) 


= > Н(х; )f (x; ), provided the series converges absolutely. 
7 
Similarly, if X is a continuous г.у, with p.d.f. f(x), then 


E[H(X)] f H (x, ) f(x; ) dx, provide the integral exists. 
In particular, if HOO = А, then Е(А®) = У, x? fx) 
It is relevant to note that E(X^) is not the same as [E(X]. 
Again if H(X) = (X — u Y, where и is the population mean, then 


E(X - uy. Xx; - ay f(x) 
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We call this expected value the variance and denote it by Var(X) or c^. That is 

E(X- и y = E(X*) — [Е(Х)]!. The positive square root of the variance, as before, is called the 
d deviation. 

It is useful to note the following important results about variance. 
i) Var(X) cannot be negative. > 

ii) Уаг(а) = 0, where a is a constant. 

iii) Var(aX) = а? Var(X), where a is a constant. 

iv) Var(aX + b) = a? Var(X), where a and Б are constants. 


More generally, if H(X) = А*, k= 1, 2, 3, ..., then 
Е(ХӘ-Хх f) 
we call the kth moment about the origin of the r.v. X and we dene by ДЕ. 


Similarly, if H(X) = (Х-и), k= 1, 2, 3, ..., then we get ізі vale, called the kth moment 
Фе mean of the r.v. X hii wi dinde by. He Tome Ху 


= E(X - i)! =®(х, и) f(x) sS? 
іп case of a continuous r.v., the summations sQeplaced by integrals. 
The skewness is often measured by ” 


n^ IET and ew В, 2 as discussed earlier. 


о? 


Example 7.13 Let X havey SABES probability distribution: 
02 03 0.2 

Gnd the probability functions of 3X — 1, А and X? + 2; and find E(3X — 1), Е(А?) and E(X? + 2). 
‘The probability distribution of the г.у. H(X) = 3X — |, is 


E3X-1) = EH()f(x;) = x, f(x) 
-22x0245x0348x02411x02-7414x0.l 


=0.4415+16+22+14=711 
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1 The p.d. of H(X) = Д is 


B 5 
02 0.1 
16 25 


and EQ?) = Ex} fix) 
=1x0.2+4x03+9x02+ 16x02 «25x0.1 
-02%12518%32%25-89 
Similarly, ЖҚХ--2) = Xo +2)/ (x;) 
=3х02 +6х0.3+11х02 + 18х02 + 27х0.1 
= 0.6 + 1.8+2.24 36*27- 109 


Example 7.14 Let X bea r.v. with p.d.i. © 
fo)-2x-1, 1<х<2 со 
-0, elsewhere. о” 


Find the expected values of H(X) = 2X-- 1 and Qe 


Now Е(2Х-1)- jorns N orda- 
-- % Ұ П . 


2 


t 2% o» 28-3: | 
z2|Qx* Ide = 2] ——-—4 
| qe E yi 


E B-osa] (2-31) 
© rs ae 
bt Lim 
213 6] 3 
апі Б (Х?)= fe fea 
y 2 х x 


-2|]x'(x-Dndx- 2} — -— 
t 4 
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Example 7.15 If the continuous г.у. X has p.d.f. 
Дх) = 20-26-9. 3sxs5 


= 0, elsewhere. 
the arithmetic mean, variance and standard deviation of X. 


Now E(X) = [ало 


-w 


E x(3- x) - S)d: 
etate {ыар не, 
ех om eg ^ ы 
EE 


Again E(X?) = fe ода Ņ® 


X 
қ 
35 S 
x “бә х-5)4х 
S 
M! 
5 3 
«Қытай е we) 
as 4 5 4 3 15 
= [то + ac9-su9- 1а, 2(81) - se 5 
243 108} 81 
= a[o 38-235]. ed ; 
Var(X) = E(X?) – [EQ 


S.D.(X) = V0.2 2 0.447 
mean = 4, variance = 0.2 an standard deviation = 0.447. 
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Example 7.16 The continuous ғ.у. X has p.d.f. f(x), where fix) = За») fat 0<х< 
E(X)  u and Var(X) 7 c^, find Р(Х – д|< с). 


Now Е(Х)- [ (х)ах 


XJ 2 
х SEG .2(9 
. Var(X) = E(X?) шд}? s (2) 


107 Ф 
= 1295900836. so that 


S.D. (X) or с = [0,0836 = 0.289 

Now PX - i4 «o)- Pe « X -n«o) 
= Р(д-с<Х <р+о) 
= Р(0.5625 0.289 < X < 0.5625 + 0.289) 
= P(0.2735 < X < 0.8515), and 


3 08515 - 
Р(0.2735 < X < 0.8515) == [а+х2)а 
4 rns ‹ 


m vartas ttpS:/stat9943.blogspot.com 


3 3 0.8515 
1. 1 £ 
4 3 0.2735 
3 3 
- jos 154 35 —(0.2735+ erm 


2 ih 05729 — 0.28032]- 0.8527 
PX - y| < о) = 0.8527. 
Example 7.17 A continuous r.v. X has the p.d. 


f()-3xQ-3) 0<х<2. i 


=0, elsewhere A 
first four moments about the mean and the coefficient of skewness. 


We first calculate the moments about origin as: ` N) 
© 

қ ом 

© 

© 


iem 
алысуға. 
al v чо 2 24 


Ш =E(X)= [xreodx 


_3[16_16{98 ШЫ 
413 4112 т 
N o- 
ГА Р = [eroe 
sa Е al Е 7 
== Pass yarn 20-0) 
Ji 114 5| 


donas 
Р -ER )= [Prd 


2 aj кене 
=3 оох?) Е.а 
ы 4 


https://stat9943.blogSR@k:GAEN то STATISTICAL THE 
64 64 
КЕНЕГЕ 


ш =E= |а годах 


4l (2x - х?)ах „г | 
L 0 


3764 1281 ELE 
EE 3 71! au! 7 
Then we find the moments about the mean as; 


д =0 


, 6 1 
аза —(дү Y eU S 


Ay = My -3u i +E; y қ 
: 55 
A. (зау i -2 20g 
5 LSJ 5 N 
м) 


ыз -4щ us OQ ) etai NN 


- aa (7 eg зау 


„Же PN 


7 ЖО 


The coefficient of si wness is 


(ығыс [^ EQ - JY = 4) 
7.6.2 Properties of Expected Values. The important properties of the expected values 
sven below: 
IF a is a constant, then Е(а) = а 
Proof: Lei Х be a discrete r.v. with p.d. |х, fx). f= 1, 2, -m 


Then E(X) for X = a is given by 


Ela) = Ya f(x.) 
ім 
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т-аДт) tafx) + .. ғаДох,) 

= a fex) + fex) +... “До 

=a) f(x)-a СХ у(х) =) 
i 


Thus the expected value of a constant is constant itself 


If X is a discrete r.v. and if a and b are constants, then 
E(aX + b) = a E(X) * b. 
. Let the p.d. of the rv. X be (> Дх) i= 1.2, .... m 


Then by definition of expected value, we havs 
" 
E(aX b) = Y (ax, +b) f(x) 


іт 
= (ax; b) Дх) + (ax, + Буба) +. + (ax, xn 
= ар) exta). ЙДІ» b ao. fell 


=a уз Ji; 2? f(x; x 


This result is often called thc expected value ot transtormation of the r.v. X 
If} = 0, then E(aX) = a ЕХ). 

ifa=l, bar Ra then E(X- 4) = ag = ш-р 

That is, the expected value of th ation of any г v. trom its mean is always equal to zero. 


The expected value of the 3 OF any two random: variables is equal te the sum of their expectet 
values. i.c. gS 


EX + ES E(Y) 
Let X and Y be two discrete r.v.'s defined on the same sample space S. Let X assume m values хі, 
2. with probabilities g(x), g(x;), .. g(x,) and Y assume n values уі, у», ..., y, with probabilities 
Wy), ..., A). The sum X + Y isa r.v. taking the values x, + у; with probabilities f(x, y.) for al’ 
jons of values of : and /. Hence by definition, we nave 


EY Y YDG) 
= DLE w+ EY flay) 
=, ЖИ 


ви УУ xf(x.y)7 Ed (хау) 


һЧр5://5іа19943.ШөеродіеӨ/бе STATISTICAL 


= УУ уж яуа бу, у] 
і 

= DY xe) (° all possible values of y are included in X. 
i n 


-E(X) 
Similarly, YY»fe»)0-Yy)3 foi») 
ig ҮЙ ҮА 


= EAS Eny +S Ey) + + fus) 
J 


= S y ho) = EY) 


Hence E(X + N= EX) EQ) 
It is interesting to note that this result holds in general, that is, the expected value of 
number of r.v.'s is the sum of their expected values. In other words, 
E(X, + Xt suo Xa) = E(X) + Е(Х,) +... + Е(Х\) 
o ЕХ) = УЕ) SS 
The result also holds for the difference of r.v.'s, i.e. КУ 
EX- Y) = Е(Х)- E). d 


4. The expected value of the product of two@ependent r.v.'s is equal t the product 
expected values, i.e. 


BAN = EOE. СӘ? 


Proof. Let the гу. X assume m values М, ..., x, and the гу. Y, the n values у, ys... Ye 
probability of X assuming the lose g(xj) and the probability of Y assuming the values y- 
The product XY is а r.v. taking the value x y, with probabilities f(x. y;). As X and Y are indepe: 
joint p.d. f(x, у) can be factored i, y) = gx) му). 


Hence E(XY) ~~ УУ хуу) 
77 
-XYXugn) 
r] 


= YxgxoY hy) : 
i і 
= RX) KY). 
This result can easily be extended to several independent r.v.'s. 


It should be noted that these properties are valid for continuous r.v.'s in-which case the 
are replaced by integrals. 
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Example 7.18 Let X and Y be two discrete r.v.'s With the following joint p.d. 


Find E(X), E(Y), E(X + Y), EX - ЗҮ) and E(XY). 


To determine the expected values of 'X апа Y, we first find the marginal p.d. g(x) and A(y) by 
over the columns and rows of the two-way table as below: 


БХ) = Xx,g(x;) 72x 040: 4x0,60 = om die 232 
E(Y) = Xyjy;)7! x 0.25 +3 x 0.50 + PX 125 


=0.25 + 1.50 + wre 


E(X+ Ү)= Ero. HIDS iyd. 


NS 
= (2 + 1) (0.10) + (^ 3) (0.20) + (2 + 5) (0.10) + (4 + 1) (0.15) 
+ (4 + 3) (0. (4 + 5) (0.15) 
= 0.30 + 14 У 0+0.75 + 2.10 + 1.35 = 620 
= E(X) + E(Y) 
E(2X-3Y) = 2 E(X) - 3 E(Y) 
= 2 (3.2)- 3(3.0) = -2.6 


Y are independent, therefore 
E(XY) = E (X) Е(Ү) = (3.2) (3.0) = 9.6 


le 7.19 X and Y are two independent r.v.'s such that 
g(x) -1 for x= 1, 2, 3 and i-i for y = 0.1. 


=2Х- Y, then verify that E(Z) = 2E(X) – E). 


DES OROREN 


ODUCTION ТО STATISTICAL THEG 


The joint distribution of the two ind 


Now BX) = Expo (53) аз) за 
i 1 


юн юч; ы 5а 
E(z) = LEX - Ү]= Ж%)- -Е(0)-2х2-1-35 


For verification, we find the value of кеу directly as below: 
Е2Х -Y] ДДУ) 


=(@х1- os ae en. 0) 


+@х2- е ~0)2+(2x3- 0; 
С PP 


Ss 
Hence the result. NY j Е # 


Example 7.20 Th e, each numbered in the usual way from one to six, are со! 
red and blue respective er casting them, a boy ‘scores’ in the following way. To the 
he adds twice the red number and then subtracts the blue number. Thus a white three, a red 
blue two would score 3 + 8 — 2 = 9. Assume that the boy casts the dice a lárge number of times 
the mean and the variance of the scores. 


Let Xw, X, and X, denote the numbers when the white, red and blue dice are cast 
let Y represent the score a boy gets by casting the dice. Then his score according to the condi 
Y= Xy + 2X, — Хь, and we need E(Y) and Уаг(У). 


Now E(Y) =ЕХ„+2Х,-А,] 
= Е(Х„) + 2 E(X;) — EQ), where 


BX - wr b] xc 2x24 3x4 ARE 4 SKE 6 


21 
6 


à 
6 


: 21 (21) 21 21 
ҮҮ=== Lo НРА А; 
Therefore E(Y) 6 zc) c3 
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definition, Var(Y) = EQ") - [Е(Ү)]?. 
EY) -EX.*2X,— Y 
` = EQ?) +4 EXP) + EQ) + 4 EGG) - 2 EGGS) –4 (Х.Х) 
S'E?) 4 EXP) + EXE) + 4 EQ) ЕХ) - 2 EU.) ЕХ) - 4 EX) EG) 


ELX?, i-w, r, b] = xc dk нох 6х2. crx pere 


ж-з. 32) 8 - era). 


ұза татады 
26,257 156 12; 2 
n= S -(7) =175. 
TEM Let X and Y be independent r.v's with joint p.d.f. N 
f(x. y) y), 0<х<2,0<у<! eS 
=0, ^ elsewhere Ку 
Е(Х), E(Y), Е(Х + Y) and Е(ХҮ). 2 
ine кауш чы); we first find the margi ^£. g(x) and Һу) as below: 


g(x) = Peo- Fea 


iyt M ie for0<x <2, 
hy) = ‘seg 
т--(1%3у2), /ғ0<у<1. 
E(X)= [xgG)dx 
% t (x MEL ger e 
X els 


BQ) = [уңу =2 [ya «3 
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узур з] 5 
2| 2 4 Б 212 4 8 


EX +Ү) = | for 3) f (x, y) dx dy 


= ffe «3a, 
9 


2 231 2 23 
= x(i+3y*) XU (Se ae ae 
Поа [porre a. 
оо 09 
ҚОЙУ КЕНЕ: EE 45 
224172 4 Lh 2414 4112 | 6 
It should be-noted that d 
Р і 4 5 47 
+E(Y) = 2+2=2 EX + Y), and 
i E(XY)+ EY аа (Х + Y). 


51,52. 


rE - 
i — E(X) EM ЕРЕ E(XY) 
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‘63 Covariance of Two Random Variables, The rovariance of two r.v.'s X and Y is a 
measure of the extent to which their values tend to increase or decrease together. It is denoted 
ar Cov(.X, Y). and is defined as the expected value of the product [X—-E(X)] [Ү-Е(Ү)]. That is 


Гоу(Х,)) =Е([Х—Е(Х)|[Ү— Em 

= E [XY — XE(Y) - YE(X) +E(X) EW) 

= E(XY) - E(X) E(Y) - E(Y) EX) + Е(Х) EW) 
= EQXY) — EVO EY). 

X and Y are independent, then E(XY) = E(X) E(Y), and 
CowX,Y) -Е(АУ)- E(X) KY) - 0 


= very important to note that covariance is zero when the r.v.'s X and Y are independent but its 
= not generally true, The covariance of a г.у. with itself is obviously its variance, 


Variance of the Sum or Difference of two Random Variables. Let X an Y be two discrete 
the variance of the r.v. X + Y is defined as 


Wa(X*Y) = ДХ Y-EQC4 Юр 

= E[UX - Е(Х)} + {У- EOF 

= E[X-EQO] + ЕГҮ-Е(Ү)] 2 (Х-ЕООІ (7 өй 

= Var(X) + Уац}) + 2 Cov(X, Y) 

Y and У are independent, then Cov(X, Y) = 0 and we " 
Var(X + Y) = Var(X)  Var(Y) SS 

y, when X and Y are independent, the тек ое the difference Х- Y is 
Var(X — Y) = Var(X) + Маг(}). © 


5 Correlation Co-efficient of КӘ Variables. Let X and Y be two r.v.'s with non-zero 
т. and оу. Then the correlati, efficient which is a measure of linear relationship between 


by рүү (the Greek g rho) or Corr(X, Y), is defined as 
_ EX - E(X)]E 2L Соу(Х,Ү) 
сус; Var(X)Var(Y) 


and Y are independent r.v.'s, then руу will be zero but zero correlation does not necessarily 
ence. The correlation coefficient has the following properties: 


Correlation co-efficient is unitless and symmetric in X and Y, i.e.. Ayy = Pyy - 


Vxwrelation co-efficient remains unchanged if constants are added to the гу s or if the 1. ^s 
же multiplied by constants having the same sign. 
Correlation co-efficient is a number between --Г апа +1 inclusive. To prove this property, we 
wundardize the r.v.'s X and Y as 

_ Х-Е(Х) diu Y-E(Y) 


"Уақ EN 77:7 


Шад.” һійр5://5іа19943.ЫМодврФітеояьн то STATISTICAL 


ItisobviousthatE(Z, = Е(2;) = 0 and Var(Z,) = Var(Z;) = 1., 


Now  Var(Z, +Z) = Var(Z,) + Var (Z2) + 2 Cov(Z;, 2;) 
But — CoWZ,Z) = (2, 25) - Е(21) Е(21) 
= Е(2\ 23) [> (21) = Е(2) = 0] 


=- p [X-EN -E =p 
Var( X)Var(Y) 


Thus Var(Z,+Z,) .-1%1%2р [^ Var(Z,) = Var(Z;) = 1] 
22 2206 * p) 
Since — Var(Z, + Z;) must be non-negative, therefore it follows that 
2(1+ р) > 0 which implies that p > -І. 
Similarly, Var(Z, — Z;)  2(1 — p ), which implies that es. 


Hence from these two results, we get ev 


0.05 0.10 


0.10 025 010 
015 010 005 | 


030 045 0.15 | 1.00 | 
Now E(X) = Xx;g(xj)) -0х020-1х0.50-2х0.30 ж 


= 0 + 0.50 + 0.60 — 1.10 
ЕЮ = Хуу) =0x0.10+1x0.30+2x 0.45 - 3x 0.15 


= 0: 0:30 + 0.90 + 0.45 = 1.65 
E(X) = Xx/g(x) =0х0.20+1х0.50+4х0.30= 1.70 


E(Y)7 Xy; Ау) -0х010%1х0.30%4х045%9х015-3.45 


Thus УацХ)- E(X*) – [E(X]? = 1.70 – (1.10)? = 0.49, and 
Var(Y) = EQ) - [E(Y)f = 3.45 — (1.65)? = 0,7275 


УАМАВЬЕЗПЇЇр5://81а19943.р1одзрої:сот 
Е(ХҮ) ">Уоу һу) 


= 1x0.10 + 2x0.15 +2x0.25+ 4x0.10 + 3х0.10 + 6x0.05 
оло+озо +050002632086: = 9.90 _ 
Cov(X, Y) = ЕХ) - -EWEN =1.90-1.10x 1.68 0085, апа - 
p- CQ). ! 0.085 Moss гет 
Харар) (049)(07275 70.595 
Example 7.23 If fx, y) «a beer 0<х<1,0<у<2 


= 0, elsewhere, 
А), Var( Y) and Corr(X, Y). 


ш айрый s are 


3 e 
Py d 
xy 
Moos e 
М 
Nw E(X)= воа poe iS. 


2 Q 
EQ» [roa М? н), 
N 
Thus Var(X) = E[X - ECO} 2)” g(x) dx 
EL raf- Hz) g(x 


1 2 
: Кура 
2 18 3 1620 


Var(Y) = Y- KY = |æ- ho) 


27); 40 ef ds y 26 
bes) 65) мы 
Cov( X,Y) = E((X — EON [LY Е(У)]} 


-M-e gaa 


z hüps///stat9945.blagipglini:o scrisicur mien 


1 
= [n8 (eum 
о 8! 24 


Соу(Х,У) __ -1/162 


JVar(X)Var(Y) ((73/1620)(26/81) 


--0.05 


Hence Corr(X, Y) = 


A 
“ -7 MEDIANS AND MODES OF CONTINUOUS RANDOM VARIABLES 


If X is a discrete r.v., then a value ‘a’ that satisfies the inequalities 


P(X a), PCI a) 27 


is called a median. If the r.v. X is continuous, then median is a value of X that satisfies the 


F(x)= 3 . In other words, the median *а”1з given by 


рове Јоан Јоан SS 


X 
Similarly, the two quartiles Q, and О; are defined by the фт» 


о cleat 1 i . 
/(х)ах=—апа |f(x)dx2 — 4Q 
22 iy 


The mode in case of a continuous r. 15 such a stationary value of f(x) for which the 
maximum. That is we get a maximum ма! en 


ә оол fO) NÉE 00 уч) «0 


provided that the solution of «S — 0 lies within the given range of the r.v. X. 


It should also be noted that in case of continuous r.v. X, the geometric mean, G, and the ha 
H, are defined as І i 


2 
log С = E (log X) = flog xf(x)dx, provided the integral exists, and 


тө 


КЕТЕ; }- Туда 


Example 7.24 Let X be a r.v. with p.d.f. 
Дх) = (хх), 0<х< 


-0, elsewhere, 


where k is a constant. Then find the mean, median, mode, harmonic mean and standard deviati 
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First of all, we find the value of k which should be such as to make 
4 ; 

fetx-x?)de=1 
9 
2; 5547! 
That is k ES =1 ог (i-i: or k=6. 

3 ^ 2:3 
Now, the mean, и or E(X) is given by 

% 


1 
д= Е(Х) = [x6(x-x^)dx 


median, а, is given by [roa = i . Thus 


6(х-з5уак= Lor жж b 45% 
3 27712 34 248 
e One 

4a —6a +1=0 су 

1 9 
factorize the equation 4а? — ба? + 1 -0a 

Qa - 1) Qd - 2a - 1) DW 


aerias rei i-o 


1243 
2 


-2a - 1 = 0, which gives а= ‚1.е. а = —0.366 or 1.366. 


values —0.366 and 1.366 are unacceptable since a must lie in the interval (0, 1). The median 
given by a - x 

5 that value of x for which 

(x) = 0, and (i) f'(x) «0 

= 6(х – x°), and f'(x)=6 (1—2x) 


70 when 1 - 2x =0 or x 


To check that this is maximum, we find 
f'(3) = 6-2) =-12<0 i 


S 1 
ie. f'(x) is negative for all values of. x, there is a maximum at x -— 
1 
Hence the mode = 2 5 


The harmonic mean, H, is given by 


1 4 2 Џ D 
ејд =3 
2 
0 0 


1 
os S 
В ev 
Again, и; = ЕХ?) = fx? f(x)de S 


i де, 
б Өз 
“Hy = E(X?) = ng u- 
PS сг — = 0.05 
Hence с = S 0.2234 
7.8 CHEBYSHEV'S INEQUALITY 


If X is a г.у, having mean 4 and variance o*> 0, and А is any positive co 


probability that a value of X falls within k standard deviations of the mean is at least ( 23! 


Рико « X « p ka) 21-7, 


or equivalently Pix-ueko]s s. 


мулклв#1р9://51а19943.р1од5рої.сот 76 


By definition, we have 


e? - (Х-и)? = fx-uY /(х)&. 


Dividing the range into 3 disjoint parts (о, и - ko) , (u — kg, t kc) and (i кс, ос) , we have 


-00 ЕТ; B+ko +00 


\ 


-fems fG)dcs “Tew? әже Joc? fade 


ш-іс жс 


the middle term, we get я eS 


o> je? sades Jem? rods 


pho SS 
-u|Zko,wehave(x- uy > Ka aseo д tko 
! bs 


ш — ko . It therefore follows that D 


o> “Pro? лде fe “ade 


ау 4. 


Тә des yag 


utka 


P(u-ko < X < u+ko)= Fose 


5 inequality is due to Russian mathematician P.L. Chebyshev (1821—1894) and it provides а 

understanding how the variance measures variability about the mean of a г.у: It holds for all 

ing finite mean and variance, In case of a discrete r.v., the proof is the same with integrals 
by summations. 


GENERATING FUNCTION 


moment generating function (m.g.f) usually denoted by Molt), of a random variable X about the 
ж exists, is defined as the expected value of the r.v. е^, where t is a real variable lying in a 
of zero. That is 


[рэ://51а19943.Ьебәрої@эпъ staristicat тн 


M,(t)= E(e' у= Ye fa, ), if X is discrete, 
fet 


ж 
= (е f(x)dx , if X is continuous. 


-0 


2 
Sine е =1+0+ 4,2, ..+. therefore for the discrete case 


+ (ex) we, (ex) 
iso Hisce BÈ 2 > m m «лә 


2 г Р 
= Eft Ед Ex fo)* Ес fas ot SEX уаде 
2 ! ! 


2 3 r 
E 4 ү! 3 t r 
=1+ A) ЕХ тус ун )+... 


2 3 
t t Й 
Ар re Hes ыз Is $e 
ж Ld 9 
Уы е 
nus SS 


Гһив we find that the co-efficient = — in S. of Molt) is just E(X") or Ш 


moment about zero. We call the function ud foment generating function because it g 
the moments of the r.v. X. It should be t a m.g.f. would exist only if the sum 
converges for all values of t in a neighboui of zero. 

it is more convenient to find t nts by differentiating the m.g.f. r times w.r. to f 
t=0. The mgf of X is PN 


1 5° 3 r 
Май) =1+18(X) + RRE ЕХ" Jeu. 


Differentiating w.r. to £, we get 


ама) _ р. 
= = B(X) +tE(X~)+ 2 E(X Mete PE. 
and AMM) er EX) A 
dt 
Differentiating it again, we obtain 
d?M,(t) 2 А Ұс 
—M WB Spes TE 
P E(X*)+ tE( X^) “аа )* 
d! Ma(t А 
and of e EQ) =u 
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erentiating r times and equating г = 0, we get 


амь 


di” 
m.g.f. about the value a is defined as the expected value of e^? 


lo? BCX") = uj 
and is denoted as 
M, (t) = ge se] "Ele" j= e M,(t) 
the т.р. about the value a is equal to ¢ ^ times the m.g.f. about the origin. 
% 


ly, we make a change in the scale, and define the m.g.f. by introducing a new variable u as 


x,-a 


и; = so that x. = a + hu. 


Then M,(t) = Хет Хх) = жес) f) =e" Ye" .) 


A { X. 
та + for tin both sides, we get M, (t) = e "^ Mj 9 
h h S) 


The m.g.f. has a very important property, из т.в./. of the sum of independent random 
is the product of the individual m.g.f %. “ә onsider.two independent r.v.'s.X and Y. Then 


м 
еу go e” |. Q E 
sens gae) 


variables have identical гора забота then we һауе 


п e. 
m.g.f. of X [mgf (ОТ 
a 


7.9.1 Cumulant Generating Function. The cumulants arc a set of parameters of a probability 
ion defined by the following identity in 7: 


e x i Z у 
zip r! |Б 


rel rel 


K, is the rth cumulant. In other words, the cumulants are given by the co-efficient in the 


n of a power series from the natural logarithm of the m.g.f. of a random variable, provided such 
sion exists. Thus 


K(t) = log. Molt) 


t 3 r 
zo iK TK —+t.-+ Kyte, 
2! г! 
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The co-efficient K,,K,,X,,....., are the first, the second, the third cumulant, etc. and Ж 


called the cumulant generating function (c.g.f-). The c.g.f. possesses a very important property that 
SE OQ TAN OC SINON TUS шиш SAINT Vo: Sig cd ME 
functions. 


We differentiate te above relation г times with reset to £ and then put £ 0 1o find 
cumulant. That is 


а” 
=|—log, M,(t 
7 E 5 af |, 


7.9.2 Relation between Cumulants and Moments. By definition, we have ` 
LJ 1 г 
K(t) = loge Mo(t) > log. 5 и, а 
rz 


2 


t г © r 
or Kt+K,—+..+K, +... = log, | 1+ & 
\ r! r! r! 

e 


м AS о n 1 8 PON 
- TERT ets "x Ht us a 3*7 * 


ЖК ү. И А 
= pjt + s =) + Hs 3и щ + 2p +... 
Comparing the co-efficient of like powers of t, we get 
= jil; 
n 
K, = 4,- 4 = Hr 
к, = д -3p +2дў = us 
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к, = H4 -Ausu - 345. +12д д би? 


= ш, -Зиг. 


793 Characteristic Function. The m.gf does not exist for many pco. distributions. We 
another function, Called the characteristic function on ). The characteristic function of a r.v. X, 
by (4), is defined as the expected value of the г.у. e" 


é( = Ee) 


= Ee s(x) or fe лода 


as X is discrete or continuous, and where t is a real number and i 4- 1), the imaginary unit. 
teristic function always exists because Je] =1 for all real 1, and hence may be defined for 
bility distribution. The characteristic function has SS advantage over the moment 
function. The c.f. may be written as a series 


QA», ‚ „ш pa а D S 
3 Hi +. Et 
> an 
the kth moment of X about the neg co-efficient of ЧИ Applications of these 


LOL ЕТЕ 


are kiven i in ше мрз that follow. о» 


‘True’ or ‘F; M statement is not true then replace the underlined words with words 
make the statemen 


A random variable can assume only one value with a given probability. 


A random variable that can assume a finite set of possible values is known as a continuous 
random variable. 0 ^ 


A random variable that can assume any value in a given interval (а, b] із Кпоууп as discrete 
random variable. 


The expected value for any discrete random variable is always equal to X(x — 4)? p(X). 


The variance f6t any discrete random variable is У, xp(x) . 


Discrete random variables may assume only positive values. 
The expected value of a constant is zero. 


274 


viii) 


ix) 


x) 


b) MULTIPLE CHOICE QUESTIONS 


i) 


ii) 


iii) 


iv) 


v) 


https://stat9943.blogspot.com 


INTRODUCTION TO STATISTICAL 
The expected value of the product of two any random variables is equal to the product 
expected values. 5 
Sum of probabilities for any probability distribution is equal to zero. 
For any random variable having mean и and variance o? and k> I, then the 


a value of X falls within А standard deviation of the mean is less than fi 22 


A random variable is also known as a 
‘ 
a) Chance variable. 
b) Stochastic variable. 
с) Variate. 
(dp Allofabove. 


The distribution function of a random variable X, "mS by Р(х) is defined as 
Gp F(x) = Р(Х<х) ev 

b) Ғо)-Р(Х>х) 
с) F(x)=P(X=x) 


d) None of above. S = 
А discrete probability distribu nay be represented by 
a) A table. EN 


b) Amathematical on. 
с) A histogram. S ; 
ёў an of ae 
A continuous probability distribution is not represented by 
A table. . 
b) A mathematical function. 
€) A graph. 
d) A density function. à 
If X and Y are two independent random variables, then Var(X - Y) is equal to 
а) VarX)- Var(Y) 
бу? Var(X)  VartY) 
c) Var(X) + Var(Y) - 2 COV(X, Y) 
d) None ofabove 2 
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If X is a random Variable and a and b are constants, then Var(aX + b) is equal to 
A> à? Var(X) 


b) Var(aX) 
c) а? Var(X) + Var(b) 


d) a Var(X) k 


Given the following distribution for a random viable X: - 
хі 1 2 3 4 5 6 | Total 


Ax) | 010 020 020 025 015 0.10 | 1.00 
the standard deviation of X is 
a) 2.000 
ÓT 1.4654 
c) 3.5064 
d) 2.1475 S 


then a is 


з) 0.1000 
6 0.1111 
2) 0.2000 NO 
4) None of above. ° 


SS 


3 X and Y are two randagliriables, then E(X + Y) is equal to 
P EX +E 

* EY 

E (X) - E) 

None of above 


wo discrete random variables X and Y are independent, which of the following statements is 
true? 4 


АХ-4)-Р(Х-4/Ү-2) 

PIX = 4 and Ү=2) = P(X - 4) Р(Ү=2) 

PLC = 4 and Y= 2) s P(X - 4) P(Y - 2) К 
PX =2)=P(X=2| Y 4) 


276 INTRODUCTION ТО STATISTICAL 
SUBJECTIVE 
7.1, а) Explain the concept of a random variable. What is a distribution function and w 
properties? 
b)  LetX be a random variable denoting the number of points appearing in a throw 
Determine the distribution function F(x), x is a real number and draw its graph. 
4,2 
/ 2592 a) Define a discrete random variable and its probability distribution. What are the 
Es | properties ef all probability distributions? 
b) Suppose X has a p.d. given b 
fx) Зат 30. 6с 
(i) Determine с. — (ii) What is the p.d. of Y=2X+1? 
c) Determine the p.d. of. агу. X, where X denotes the number of aces іп а hand of brs 
7.3 а) Define a discrete r.v. Giving illustrations, explain ұз is meant Бу a discrete p.d 
6 b) A bag contams 4 red and 6 black balls. A sam f 4 balls is selected from the 
a) = replacement. Let X be the number of red bal Find the p.d. for X. (P.U, В.А. 
74 aj A large store places its last 15 clock гаф 8 clearance sale, Unknown to any 
radios are defective. If a customer, t different clock radios selected at 
the p.d. of X= number of defect os in the sample? 
b) Three balls are drawn from (big containing 5 white and 3 black balls. If ¥ 
n number of white balls d m the bag, then find the p.d. of X. 
P. (59 (8) Explain the concep! istribution function. Hence or otherwise, differenti 
v discrete and conti random variables. 
b) Given the а probability function - 
fx х-2х7),0<х<2 and zero otherwise. 
CalculatQhe value of А so as f(x) may be a p.d.f. (P.U., B. 
of 7.6 a) Define a continuous r.v. and its probability density function. 
Y b)  Acontinuous r.v. X has the p.d.f; as follows: 
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x/2 %г0<х,21 


10-9) (г1<х<2 

fa)=}2 for 2<x3 
54- х) for 3<х<4 
Же 


Compute P(X 23), Р(Х = 2), P(| X |«1.5) and P(1« X <3). 
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a) Explain the concepts of the Probability Function, Probability Density Function and 
Distribution Function - (P.U., В.А./В.Ѕс. 1990, 92) 


b) A continuous г.у. X has the p.d f. 
Дх) = AQ х) (2 +x), 05х52 


= 0; elsewhere. 


Find (i) the value of A, (ii) P(X =>), ван) P(X <1), (іу) P(X 22), (у) Р(1<х<2). 
(P.U., B.A./B.Sc. 1993) 


Let X be a continuous г.у. with p.d.f. 
Ax) = 6x (1-х), 0$x € 1. ` 
L^ o: otherwise. 


Check that Хх) is a p.d.f. and sketch it. 


Obtain an expression for the distribution function of X. 4 , 
| Compute At<x <3) and Axszitex<3} SS 
» Determine a number b such that P(X«b) = р> (P.U., M.Sc. 1974) 


that the life length (in hours) of a certain "LS is continuous random variable X with 
ility density function 


д5 
f(xy = 99. х>1оо end zero ehe 
x М 


What is the probability that a t П last less than 200 hours, if it is known that the tube ts 
still functioning after 150 hoy! of service? 


What is the probabilit if 3 such tubes are installed in a set exactly one will have to be 
replaced after 150 service? 


What is the maximum number of tubes that may be inserted into a set so that there is a | 
probability of 0.5 that after 150 hours of service all of them are still functioning? 
í (P.U., B.A. (Hons.) Part-I, 1970) 


Explain the terms: a joint distribution function, a joint nrobability distribution, marginat and 
conditional distributions. : - em. 


Given the joint p.d. of two r.v.'s X and У, whose values Дх, y) 
к 6 1 1 4 
L)=—, fl, D=— /(,3)---,/(2,)---, 
ше /(1,1) 30° 1 = 39776 ) 30° ^^ ney 
S 1 2 4 
Һ/(2,2)---. 3) ==, f3, )=—, /(3,2)---. 
72,2) 30 70,3) 30° ^t ) 30 /@, 2) 30 
69-5, find all the marginal and conditional distributions. 
AN 


| 278 


pt 


715 


b)  Iftwor.v.'s Xand Y have the joint p.d.f. 
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Suppose that the following table represents the joint p.d. of the discrete r.v: (X, Y): 


1 
6 
L 
9 
i 
4 


a) Compute g(x), Ж), Дх/у) and flv’), 
b) Decide whether X and Y are independent. 


Let X and Y have the joint probability functions given by  . 
) -Ль»-%. х-2,4,5,у-1,2,3. 


т b 
ii) fix, y=. х%1,2,5у- 2. SS 
Find the marginal probability functions of X and Y, afia out if X and Y are indepen 

(P.U.. B. AJ 


a) " Whatisa joint probability density ыы does a marginal pronti. 
from a conditional probability functi 


b) Given the joint p.d.f. f(x, Sty (х+у),0<х<1,0<у<1 and 0 е 
marginal and conditional p£Q) 
Suppose the joint p.d.f. of (X, A given by 


Д(ху)=х ыры, 0<у<2 and 0 elsewhere. 
a) Check t jf») dx dy =1. 


b) | Compute (i) Ax 2 (ii) P(Y<X), (iii) Р(Х+Ү>1) and (iv) ңү «x « 


3) Explain clearly the meaning of marginal and conditional distributions with 
bivariate density function f(x, y). 


f(x, y) i+) for0<x<1,0<y<l, 


=0, elsewhere; 
then find the marginal distributions of X and Y, and their conditional distri 


Ңх <4ir=4} 
c 2 2 
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Given the joint density function of the r.v. (X, Y) as 
f(x, y) =3х?у+3ху?, for 0<х<1,0<у51, 
= 0; otherwise. 
Find the marginai and conditional density functions. Also find the conditional probability 


At sx 5215 <у< 2 (P.U., В.А. (Hons.) Рап-Ш, 1964, 65, 68, 70) 


2 


Explain the idea of joint probability distribution, conditional distribution and marginal distribution. 
The random variables X and Y are jointly distributed as 


f(x y) 224xy(I-x) 0<х,у<1. 


Obtain the marginal distribution of X, and the conditional distribution o; Y. Are Х and Y 
»dependent? (P.U., M.Sc. 1970; B.Sc. Hons. Part-II, 1972) 
а) Define E(X), the expected value of a random variable X. =: 
If X is a r.v. and if a and b are constants, then prove that e 
E(aX + b) = a E(X) * b. Sl 
If X and Y are г.у., then show that (X+) = вда Y) 
Show that, under certain conditions to be 
E(XY) = E(X) EW). M 
Explain random variable and its tical expectation. 
Prove that E(cX) = cE(X), һе М5 a constant. 


Two unbiased dice are Ca payment equal to the sum of the spots on the top sides is 
given the caster. Compyi@she expected value of the payment. 


X. А committee of am 5 to be selected at random from 3 women and 5 men. Find the 
expected number о теп on the committee. (P.U., В.А./В Sc. 1979) 


^Let X have the possible values 
1 


ly 
TERES 


і 
2,- 25, 24,.., (=) Z, and {һе corresponding probabilities i 5 H 


Find the expectation of X if it exists. 2, (P.U., B.A. Hons: 1970) 


Define expectation and prove that the expectation of the sum of two random variables is 
"equal to the sum of their expectations. (P-U., B.A./B.Sc. 1979, 81) 


Tae 


b) 


a) 


b) 


ы 


b) 


Let X, and Же two independent r.v.'s having variances Ё and 2 


Verify that E(X) + E(Y) = E(X + Y) by using the random variable X with the p.d. Ж 


х= 1,2,3,4, and the ry. ¥ wit the pd. f(y) = G G yey Uae: 
(Gomal, B.A./B 
The p.d. of a diScrete random variable X is 
3 1 x 3 3-х 
- i| .х-0,1,2,3. 
f(x) 88 B x 
Find Е(Х) and EÊ). -- (P.U., B.A JB- 


Let X be a random variable with probability distribution 
2 3 


0.125 0.50 0.20 0.05 0.125 


i). Find E(X) and Var(X). 
ii) Find the p.d. of the r.v. Y= 2X + 1. Using the RPof Y determine E(Y) and У 


iii) How are Е(Х) and Е(У), and Var(X) related? 
For what value of A, the function делега о 
f(x) = AP (=x), 0х1 д 


=0, d 


Find its mean and variance. 


Also find ңі <Х< m its distribution function. (P.U., В.А. 
Show that а EQ?) - [Е(Х)]?. ` 


w will be a p.d.f.? 


Var(3X; — X,) = 25, find k. 
The following table shows the distribution function F(x) of the r.v. X: 


544 х<1 х<2 х<3 х<4 
` F(x): 1/8 3/8 3/4 1 
Find (i) probability distribution of the r.v. X, (ii) E(X) and Var(X). (Р.О., B. 
If f(x)= LIS for x = 2, 3, ..., 12, then find the mean and variance of 


variable X. 


If f(x) = =, (х= 1,2, А n), then find E(X) and Var(X). 
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€) Suppose X can take the values 0, 1, 2, ..., n with frequencies proportional to the binomial 


co-efficients 4 Hi oH "| Show that £/X)=" and Var(X)=—. 
0 n 2 4 


P.U., M.A., 1970: В.А. (Hons.) Part-II, 1970 
Axes ( 970; B.A. (Hons.) Part ) 


A and B throw with one die for a prize of Rs, 11, which is to be won by the player who = 
throws 6. If 4 has the first throw, what are their respective expectations? 


A, B, C and D cut a pack of cards successively in the order mentioned. If the person who 
cuts a spade first, receives £ 175, what are their expectations? 


A bag contains 2 white and 2 black balls. Three persons 4; B and C in the order named 
above draw a ball and do not replace it. The person who draws a white ball first receives Rs. 
18. What are their expectations? (P.U., B.A/B.Sc.,1986) 


A distribution is given by f(x) = x*(1 — x) between x = 0 у” 1. Find the mean and 
standard deviation. 


9 
А continuous г.у. X has p.d.f. given by f(x) = cx for IX © 
(i) Determine the constant с. (ii) Find the mean, ула and standard deviation of X. 
л < (P-U., В.А /В.5с. 1980) 
А variable X has the р... - 
Год-0, for х<0. Nek 

=3 (997 © 4 

= "ig 2y', for0« x «2. ©) 

= & Jor x22. д? 


Find the expected value of. x its standard deviation. 


Find the value of k so M function f(x) defined as follows, may be a density function. 


. f(x) = (1-5), 0<х<1. 
= 0, otherwise 
Also determine its mean and variance. (P.U., В.А./В.ӛс. 1978) 


А r.v. X has the p.d.f. as 
f(a) = 4x(9- x"), 0<х<3 
=0, otherwise 
nd the value of 4, the mean and the standard deviation of.Y. (P.U,, В.А./В,8е. 1990) 


` 


282 
7.29 


7.30 


7.31 


7.32 


7.34 
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А r.v. X has the probability density function given by 


«or». -3«xs-l 
fos g 6210 -і<х5<1 
te-a, 1<х<3 
0, elsewhere. 
Find the mean and the standard deviation of the r.v. X. (P.U., B.Sc. Hons. Р: 


Find k, the mode and the mean for the distribution, the equation of whose p.d.f. is f(x) = 
{ог 0 < x < 3. Alse determine its variance. : (P.U., В.А 
A continuous г.у. X has the p.d.f. given by 

S (x) =k(2-x)(x-5), 2<х<5 

=0, elsewhere. 4 

Find the value of Е, mean and variance. What аге "e o of the mode and: 
distribution of.X? 
Let X be a.v. with p.d.f. S 

fQ)-62-x)x-1, 1<х<2 ND 


20, onm 


WC xd by : 3 


6 log (16 С) = 19 (P.U., B.A. Hons. 


Explain what is meant by. m ofa distribution, and define a suitable measure 
Use this definition to Іше the skewness of the distribution defined by the density 


f(a) =k qa" 0<х<1ап40, otherwise, 
where k is а normalizing constant to be determined. It may be assumed that 


M 
1 E С p Ж” 
fr Ue 75 ЗВ ha 


a) · The rectangular distribution is given by y = k between x = -а and x = а. 
variance and mean deviation. 


b)  Thep.d.f. of a rectangular distribution is 
f(x) =1, for 0<'х<1. 


Obtain the first four mean moments and obtain also the mean deviation 
function. (P-.U., 
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2 LetXbea continuous random variable-with p.d.f. given by 


Eo фах е ; 
2 
d 15х<2 
/(х)={2' T 
(E 25х53 4 
2 
0, elsewhere. 


Find the mean, the variance and the moment measure of kurtosis of X. 
қ; H 


Find the first four moments about the mean of the distribution Дх) = х (6x)! between x = 0 and | 
x 6, and the kurtosis of the distribution. 477 


Let.X and Y have the joint p.d.f. described as follows: 


p. 


ч : e. 5 
Prove that the correlation co-effici Ме two r.y,’s must be a number between —1 and 
*1 inclusive. 


Given fx, у)-2-х-у,0 grs 1, 0 < y < 1, and 0 elsewhere. Find co-efficient of қ 
correlation between X and () қ : 


State and prove the СһеЬ@®Йеу'$ inequality. Explain the significance of this inequality. 
Show that the d f the sum (or difference) of two independent r.v.’s is equal to the 
sum of their variances. ? 3 


Define Moment Generating Function and Characteristic Function ofa random variable X. 

Show that the m.g.f. of the sum of two independent r.v.'s is the product of their m.g.f.'s. 
(P.U., В.А /В.5с. 1993) 

Two fair dice are rolled. Find the probability distribution of minimum of two numbers. 

А bag contains 2 white and 3 black balls. Four persons А,В, C and D in the order named 


above, each draw a ball and does not replace it. The person who draws a white ball first 
receives Ёз. 10. What are their respective expectations? 


(P.U., B.A./B.Sc. 2006) 
Define: 2 


i) Probability Function ii) Probability Density Function 
11) Distribution Function 


b) 


c) 
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А random variable X has the moment generating function about origin as 
M y(t) = (0-30) 
Obtain the mean and variance of random variable X. 


The annual gross earnings of a certain pop-singer are a random variable with an 
value of Rs.40,00,000/- and a standard deviation of Rs.8,00,000/-, The singer's 
receives 15 percent of this amount. Determine the expected value and the standard 
of the amount received by the manager. (P.U., В.А./В.$с. 


%9%%%.... 
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INTRODUCTION : 


As discussed in the previous chapter, a discrete probability distribution gives the probability of 
possible value of a discrete random variable. We shall introduce here some important discrete 
ility distributions which are often used in statistical theory and analysis. 


BINOMIAL PROBABILITY DISTRIBUTION 


Many experiments consist of repeated independent trials, each trial having only two possible 
ntary outcomes. For example, the two possible outcomes of a trial may be head and tail, 
and failure, right and wrong, alive and dead, good and defective, infected and not infected and so 
If the probability of each outcome remains the same throughout the trials then such trials are called 
Bernoulli trials and the experiment having n Bernoulli trials is called binomial experiment. In other 
an experiment is called a binomial probability experiment if it possesses the following four 


The outcomes of each trial may be classified into one of two categories, conventionally called 
Success (S) and Failure (Р). It is to be noted that the outcome. terest 15 called a success and 
the other, a failure, 


The probability of success, denoted by p, remains co: "for all trials. 
The successive tríals are all independent. e 
The experiment is repeated a fixed number of Nis, say n. 


2 binomial random variable and its p. Пед the Binomial Probability Distribution. The г.у. X 


. 
When X denotes the number of succes м trials of a binomial probability experiment, it is 
nously take on anyone of the (n + 1)1 x value 0, 1, 2, ....n 


When the binomial r.v. X зәуде x, the binomial p.d. is given by: 


fqq =х) [Р g=, x =0,1,2,..п, 


g= l — p; the probability of failure on each trial. The binomial p.d. has two parameters n and p and 
y denoted by b(x ; n, p). The binomial probability distribution is appropriate when a random 

of size n is drawn with replacement from a finite population of size М, or sampling is done from 

ше population. 

The binomial p.d., which is the most widely used distribution in two-outcome situations, was 

by the Swiss mathematician Jakob Bernoulli (1654-1704) whose main work on probability, 

Conjectandi (the art of conjecturing) was published posthumously in Basel in 1713. 

%21 Deviation of Binomial Probability Distribution. To derive а formula that gives the 

of successes in n trials for a binomial experiment, we proceed as follows: ` 


experiment has n trials, each of which may result in S or F. The sample space has 2" possible 
points or outcomes, each outcome consisting of a sequence (а), аз, .... аһ}, where each aj, is 
ar F. We desire to find the probability of these outcomes according te the number of successes. 


we consider the probability of zero success, i.e. P(X — 0). In case of zero success, every trial 
F and the event consists of a sequence of n F's, i.e. [FF ... Е}. Because in each trial, P(S) =p 
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and P(F) = q and trials are independent, so we apply the multiplicative law of probability for ind 
events and obtain 
P(FF...F) = P(F) P(F)..P(F) [а times] 


=4'. 


Since there is only one sequence of outcomes of n trials resulting in A 9, therefore 


P(X=0) =q" 

Next, we consider the probability of one success, іс P(X-1). In this case, one trial results in 

the remaining (n—1) trials result in F's. The event consisting of one S and (n-1) F's can occur in 
different sequences. One such. sequence is {SFF...F} and the probability for this 
pa” '. Another possible sequence is (FFSF:..F) and the probability for this sequence is the same as 
first sequence. in other words, the probability for any possible sequence consisting of one S and ( 


is pg". But the number of mutually exclusive sequences in which опе S and (n-1) F's can occur, 


Therefore the probability of exactly one success for all possible UNS combined, is 


) 9 
PX -y- f |мен o 
1) ° 

4 9 

The above argument may be repeated for X = 2, 3, 4, etc RX 
Finally, we consider the general case, i.e. ^ The probability of a sequence that 
x successes and (п-х) failures is р^“ and () different sequences in which x 
x 


(n-x) failures can occur. Therefore the рг Ы lity of x success in n trials is 
Р(Х =ху=|" SN forx -0,1,2, ..., m. 
TUA tt, 2, +, DR. 


Thus we have obtdined la for the binomial probability distribution having m 
probability p for success binomial probability distribution derives its name from the 
` 
probabilities (4 2475, for x = 0, 1, 2, ..., n are the successive terms of the binomial 
x 
(q+p)". That is \ 
(9 + р)" = "ри, where x 4 0, 1,2; ..., n ^) ж 
x 


Qe (uen rn er 


= b(0; n, p) + b(1; m, p) + 2; n, p) +... + Ып; п, p). 


n 
(ren or Уел p)=1 because g4; 
о 2277) х-0 

necessary condition for any probability distribution. [t is to be noted that the 


» n 
The sum of probabilities, ie. У, 
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cesses is given by p(X Sr) -Х ys "75 where p(X <r) is the cumulative binomial distribution 


x=0 


X. There are tables for the cumulative eee P(X sr)= Ye; n, p) for some values of n, p 


х=0 


Example 81 А fair coin is tossed 5 times. Find the probabilities of obtaining various numbers of 


Let us regard the tossing of a coin as an experiment. Then we observe that 


each toss of a coin (ie. each trial) has two possible outcomes, heads (success) and tails 
(failure); 


1 
the probability of a head (success) is. р = 7 and remains the same for successive tosses; 


the successive tosses of the coin are ей, and SS 
the coin is tossed 5 times. 
T 


Therefore the r.v. X which denotes the number of кы ссеззез) has a binomial probability 
with p=1/2 and п=5. The possible value of X ац; 2, 3, 4 and 5. Hence 


P(no head) = Р(Х=0) = BE JE Жи 
В S 
P(1 head) = Р(Х=1) = Qs {| = 
s T Ew шү 
P(2 heads) - PX ESSI (2) -10 
3h 45 х=зу= [5 
P(3 heads) = P(X=3) В 
; syi 4 1 5-4 1 5 7 
P(4 heads) = P(X=4) = HB (3) -sx{2) To 
“ау ҚА 
i] =): 


n : 5 > 
probabilities сап also be obtained by expanding the binomial G +4) . The binomial 
distribution for the number of heads obtained in 5 tosses of a fair coin is 


PIS heads) = P(X-5) = ЖЕ 


https://stat9943.blo logs] ot.com 


UCTION TO STATISTICAL 


3 
Example 8.2 An event has the probability p zn Find the complete binomial di 
п = 5 trials. 


Непсе р= sothat q=1-p=2;andn=5. 


Hence the desired probabilities are the successive terms in the binomial expansion of G : 


900090000 
18) (1Д8/ (8) (28) (8 
ie. T — 4 + 5.553) 1045) G? +1065)" G) + sso 


S 


ie. mu ——-[12549375-11250--6750 2025+ 243p 


Le. — [0.0054- 0.2861 + 0.3433 + 0.2060 + 218 + 0.0074] 
We сап now write these probabilities 9 form of a probability table as below: 
ы Й 
EGRE NC ИНГӘН 


P(X7x) 0.09545» 0.2861 03433 02060 0.0618 0.0074 


Y 1 
Example 8.3 Let X М a binomial distribution with n = 4 and р= =. Find Р(Х) 
P(X=3), P(X=6) and Р(Х<2). 


The binomial probability distribution for n = 4 and p-lds 


4 1 д 2 4—4 B * ч 
(4) (3) for x = 0, 1, 2, 3, 4. 
Баса _ 32. 
өте ЗЕ 
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n X= 1- 1 3) 70; because a r.v. X with a binomial distribution takes only one of the integer 
B L2 22. 


4YiY(2Y7 g 
=3)= = = = =—: 
ВО 3)» f) ЕТЕ 81° 


P(X = 6) = /(6) = 0, because X can take only values 0,1,2,3,4, ' 


2 
PX $2)= 5 а)-/(0)% f+ /0) 
х=0 


Ж-О Б-У 


Тт%-------.. 
81 81 81 81.9 
le 8.4 А and B play a game in which A's probability о 


is the probability that A will win (1) exactly 
(iv) from 3 to 6 games? 
e 


observe that $5 
М 

Жете аге two possible outcomes, i.e. А will vinti not win the game; 

Фе probability of A's winning in each 


- 2/3; 
Фе successive games are incepet or lost; and 
Gere аге 8 games. > 


the Binomial probability di 
denote the number of 


Ening is 2/3. In a series of 8 
4 games, Кыш least 4 games, (iii) 6 or more 


tion with л = 8 and p72/3 is appropriate 
by A. Then 


.(8y2Y'(1 Niz 
Жл=4) = 2) (3) =—— =0.1707 


6561 


БЕЛГЕ 
^8 53 Gg =] 


1 
= 1-=——[1+16+112+448 
; 6561! * 1 


"ТТТ e—a 
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« мән Г 
(090090909 


64 
= 4 ананы 9445. 1222 0,4682 
COT e b dui Er ТЕГТІ) 


6 x B-x 
ім) Р(3<Х<6)= DN B5 


ОУ ВЛ T6) 80] 


_ 8x644 5152. 
6561 6561 - 
Example 8.5 Тһе experience of use-agent indicates that he сап provide 
accommodation for 75 percent of the cliui Who come to him. If on a particular occasion, 
approach him independently, calcula! probability that (i) less than 4 clients, (ii) exactly 


(iii) at least 5 clients, will get satisfa accommodation. 
We observe that SE 
a) there are two x outcomes, іе: each client will get or will not get accoi 


b) oneach оса probability of getting accommodation is p — 3/4, 

c) clients approach the house-agent independently, and 

d) there аге 6 clients. 

Therefore the binomial probability distribution with п-б and р=3/4 is appropriate to 
desired probabilities. 1 


Let X denote the number of clients who get satisfactory accommodation. Then we 
(i) Р(Х<4), (ii) P(X=4) and (іш) Р(Х2 5). Hence, 


ТІ 
-CBT HA, 
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6 - 
- (+) [1+©@)+015)9)+(20)(27)] 


694 
= == 0.169 
4096 
B 2 
P(X«4) = |6 3) (3 = 1381.125 19597 
аЛа) (4) ^ (4 4096 


"ә ЈГ 


- ОШ 2 


(ӨЗ: s 4521112 
x (+) {(6)(3) od urs = 0.534 SS 


Ke : қ 

Binomial Frequency Distribution. If the binomia ability distribution is multiplied by 

amber of experiments or sets, the resulting шы is known as the binomial frequency ` 

. Thus the expected frequency of x sees experiments is М. (re . It should be 
. x 

the n independent trials constitute one nt or one set. 


ҳу 
le 8.6 Six dice аге thrown Ries How many times do you expect at least three dice to 
ог а six? d 


. 
probability of getting a 5 "6 with one die is p- 2/6. Since 6 dice are thrown and there are 
Же binomial frequency dig tion is given by : 


6 
(23) 
47223 


PROIN 
= CITI) 


{160+ 60 +12+1]=233. 


me 
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Example 8.7 А certain event is believed to follow the binomial distribution. In 1024 samples 
the result was observed once 405 times and twice 270 times. Find p and q. 


The first three term in the expansion of the Binomial Frequency Distribution М4 = 
corresponding to x = то and 2a Ту ip and (^r. 


We are given N = 1024, п = 5 and the following information: 


ша“; =405, 
5 5- ір 2 
1024 4 =270, 


Dividing the second equation by the first, we get 
3,2 
1042р? 20. 2p 2 


sit". AM ‹ 3 S 
or Зр =дог3р = 1--рог4р = 1 Кы : 
Непсе те келеке P 
4 4 М 
8.2.3 Properties of the Binomial ty Distribution. The properties of the 
distribution include the mean number of the variance of the number of successes, 
skewness and kurtosis, etc. and the shape: distribution. Some of the properties are di 
Т) Let X Ье а random уар with the binomial distribution b(x; п, p). Then its 


eie ai Ad np and с> =npq respectively. 
Now Mean, 47 E00 XQ 
“ 


с |+”. where x ^0, 1,2, .. n. 
EE 
- 04" „(ер еар +..+np" 
= npiq’ ee; yop сере" 


=np (q+ р)" 
= np, because д +p = 1. 
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ative Method. 


è c n x, "-X > 
he Mean = E(X) = х) 4 ; 
Бш 4")- xmn-D' (3) 

x) x(x-l!(n- x)! x-1 


“(n= 
E(X) = aem , for x = 1, 2, ..., n (since the first term in the summation being zero 


=n C prre 
х-1 


zl 


(x= 0) is omitted). 


Substituting y =x— 1 and m =n — 1 in the summation, we get 
m 

BY = т} |" pa" as xranees fom | ton 20 (=~ ToS from 0 to n — 1 i.e. т) 
у=0 e 


=np (77 summation is the expansion of (q+p)") EM 
E = пр. In other words, the mean onte is np. Similarly the mean number 
By definition, the variance c? , is co 
a! = EX - p Y = ЕО?) [apy 
Bu ҚА) =ЕХ(Х- n- E [X (X - 1)] + EQ) 
= XG +np 


Now  E[X(X-1)] = ме (^ phe 
х=0 x 


= © p Ma-Da-2! à 2а 
: уж EC Р 


тоа 0-20. 3 а 
TE Lene 1 


i (x starts at 2 since x = 0, 1 add nothing to the sum) 
е 
term (n – x) may be written as [(n — 2) - (x — 2)]. 
ituting y = x 2 and m = n – 2 in the summation, we obtain 
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т-у 


E[XQc- D n(n-V)p кзы y» 


7 n(n-l)p? $e yu 
yo 


= n(n— 1)p (7^ summation is 1) 
Thus о:  -E(X)-[EQOY 
=ЕДХ-1)] + EA) – [EYP 
-n(n- 1) p^ + np - (np 
-ap-np*np- тр? 
= пр пр = пр(1 — p) = npq, and 


а = ұпр4 г қ 


Hence the variance of the number of successes is npg, ар пе standard deviation is 


2) Higher Moments of the distribution are fou: low: 
By definition, the moments about the origin d n by the relation 
-Е(Х”) xS 


Now ш =E(X)=np о» 
My = E(X*) = чк» +пр 
je -Е(Х y 
But pn 6 -2) + ЗАО =) +2, 
^ E) = ЕМИ) @22)) + 3E [XQC-1)] + EW) 


п Toir 
Now ко-25%) 47" ар, 


x-0 


3E[X(X-1)] = зУях- Y" ys аы 


x20 


= Зп (n- 1) p, and t 


E[X(X-1)(4-2)] = dae 1)(x- xt Jes äg" 
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mn—1)(n—2)(n—3)! 
җ(х-1)(х—2)(х—3)!(л- x)! 


n-3 
= n(n-1)(n— af J 


But м-ро A ene 


х=3 
Substituting y =x- 3 and m = n — 3 in the summation, we have 
em 
E[X(X-1)C-2)] = n(n-1) (n-2), p? D ye 
yo 


= n(n-1) (n-2)p? 


i =щл-1)(л-2)ур! *3»-1)p*np, 00 $ 
u, = EQ) e 
S 
e 4 п X QE e 
2: (s so 


z* as x(x — Dx — 2)(x - 3)+6х{х — 1)(х -2) +x and proceeding as above, we get 
и, = п@є1)(л-2)(л-3) р*+ 6n(n—1)(n-BOV n(n) p?-+ np 
: быр ғұз - 
By = Hy 34s +201) — UN 
ve 
= [n(n-1)(n-2)p" 3n( ff np] -3np[np^-np^*np]*2np 
X 
np (1-3p + 20% (1 =p) 2) 
= npq (9 — p); 
a, = us – 4ши, + 6C) Q5) - au)" 
itution and simplification give 
m, = n'[p*— q*] + n'[-6p + бр? + ép! - p*] + 
m (1p'-18p &7p 4p" -12p'-8p*] + n[-6p*+12p*-7p* +p] 
= 3n'p' (1 - p + np (1 p) (1 - 6p + 6p) 


= 3n'p'q^ + пра (1— бра) = npg [1 + Xn ~ 2) pq). 


295 


E[XQC-1(X-2)] = n(n-1) (n-2) pie you (x starts at 3 since x = 0, 1, 2 add nothing 


to the summation) 
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Now f-/5 -pa-p _ (9-р)? _(1-2р)?, 
д0 ора) npq пра 


3) The shape of the binomial probability distribution depends оп the values of the two 
p and n. The sketches indicate the influence of p and n on the shape of the distributi 


b(x; 5, 0.2) b(x; 5, 0.4) b(x; 5, 0.6) 


a. 


012345 012345 


(х; 10, 0.3) PM b(x; 10, 0.5) 


0123455648910 01234 56 9 
1 

Thus we observe that, "ees the distribution is positively skewed and when 
distribution becomes d skewed. In general, when p +q, the distribution will be 
more the difference ыы, p апа 4, the greater the skewness will be. When p =—, the 
always symmetrical. As п, the number of trials, increases to 90, A, — 0 and 8; > 52 
n, the binomial probability distribution is symmetrical and mesokurtic. 

Example 8.8 Іп the В.А. Examination, 24 candidates offered Statistics. If the 
passing the subject be 5, find the mean and the dispersion of the distribution. 


Here л = 24 and p-i юіше-1-р-2 


Hence mean = T and 


1 2.16 
= npg =24x>x 2 216-533, 
MOD UL X ps 
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8.24 The Recurrence Formula for the Binomial Distribution. Beginning with the value of 
Хәй), probabilities for other values of X, the number of successes сап be computed more easily by the 
rrence formula 


P(X ex) e ТЇР po =x-1) 
ха 


as follows: 
Let X be a random variable with the binomial probability distribution b(x; п, p). Then 


n Snn n! 3 -X 
= = „and 
х) ("р T xt[n — x)! T 


n n! 
Р(Х =x-l= ilm _ t-l a-et 
( iss (i i (х-Іт-х-1 r Ж 


Dividing Р(Х-х) by Р(Х-х-1), we get 


PX =x) т"  (x-D(n-xe)! рат" 
Р(Х 2x-1) х(п- x) n pq — т т-х+1 Ss 


Qn-x*l 
п-х+1 р 
Неке Р(Х = х) = ———— x grer- DANS x*1,2,. 


54 
4 


Although computations can be ol qe easily by this method, great care must be taken 

y 

Example 8.9 reina алды with mean 3.20 and variance 1.152, find бе compe 
probability distribution. ДХ? 

X a binomial r.v. SESTAS жїр. then Е(Х) = np and Var(X) = пре. 

Now ЕХ) = ne 3.20, 

and Var(X) = 1.152 so npq = 1.152 

x ting for лр in the second equation, we obtain 


(3.20) q= 1.152 


т 4-2. 036 зор-1-4-0.64 


med 100.36) = 3.20 gives n = 5. 
n = 5 and p = 0.64 so that X is b(x; 5, 0.64). 


Р(Х =х)= [р Jose exe, ғ 


Р(Х-0) = (0.36) = 0.006047 
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Beginning with the value of P(X = 0), we compute the probabilities of other values of X, 
recurrence formula 


Ахаха тх) 
х. 9 
ES X 2x-1) 
x (036 


5 (0.64 
Х-Й-- 0.006047) = 0.053751, 
қ ) 110.36 те 


Р(Х-2)- (esh Joosszsp- 0.191115, 


2 

3/ 0.64 

Х-3)-- {олш = 0.339760, 
Р( ) (2 036 ) 


_ 4-2 (0.64 d > 
Р(Х =4)= ЕС 0.339760) - 0.302009 9 


4 
0.64 d 
Р(Х =5) = Қа 0.302009) - um 


The sum of probabilities tums out to 0063 instead of 1. Error has been i 


rounding process. Aae 

8.2.5 Fitting a Binomial on to Observed Data. Fitting a binomial 
given frequency distribution iste) estimating the values of p and л, the two 
completely determine a binomia! ion, and (ii) calculating the probabilities as well as 
frequencies of x = 0, 1, 2, ..., a? ing that the given frequency distribution has the 
the fitted theoretical binomspdi jon, we calculate the mean of the observed frequency 
X and taking it as the е of Ц, we equate it to its theoretical value, i.e. np, where л is 


as the largest x value given. Having found the value of p, we compute the expected 
procedure is illustrated by the following example. 


Example 8.10 Fit a binomial distribution to the following data, obtained by tossing а 


Pese [ [3 [2 [3 [4 5 [os] 
LC [i2 |е [o э а |1 Гаю | 


(P.U. В.А. 


To fit a binomial distribution, we need to find n and p. Hence n = 5, the largest x~ 
we use the relationship X = лр. 


times: 
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the relation X = np, we get Sp = 1.99 or р = 0.398. 
r.v. X represent the number of heads, gives the fitted binomial distribution as 


(х;5,0.398) = EN jo 398)" (0.602)5-* 


Now the probabilities and — are calculated as below: 


No. of head (x) Probability — f(x) bre 


Jr =(0.602)° = 0.07907 


j2 p=5.0. 602) (0.398) = кў 


№ 


82 2.100.602) (0.398): 0.34559 


AS 


y р? =10.0 si = 0.22847 


je" m) (0.398)* = 0.07553 


сю 398) = 0.00998 


expected frequencies are obtained by multiplying each of the probability by 200. The 
can also be calculated, using the binomial recurrence formula: 


сі 
Р(Х = у= СЕ P pt = x=) 
x q 


Moment Generating and Cumulant Generating Functions of the Binomial Distribution. 
ef the binomial probability distribution b(x; п, p) is derived as below: 


Mot) = Ee) (by definition) 


(уе 
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- I ee =(q+ pe’. 
E 


The expansion of this binomial is purely algebraic and need not be interpreted in 
babiliti 


We get the moments by differentiating M(t) once, twice, etc. with respect to г and 


d 
f= E(X) =| r 
Ai = E(X) [5+0 ж 
= (аре (q pé)" "]-o = np; and 
А 2 
кво? repe 
5 t=O 


= [npe (qpe) "Jeo + |тп-1)р е" (q+pe) x0 


=np+ піп - 1p? KS 
By = H - Wy =прд S 
© 
In a similar way, the higher moments аге i СЯ 
Xenia quai пее iu) ni 
k(t) = log, M; 


Ld 
N 2 pp 
gerri] 


2 3 
reip] 


Expanding the log and comparing powers in /, we find the first four cumulants to be 
ky = up = mp; 


ky =U, = np(1— p) = пра; 
ky 24, =пр(1- р)(1—2р)=прф(4- р); 


k,-=np(1— p1—6p + 6p") = пра(1-6рд). 
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Example 8.11 Prove for the binomial distribution, the following relation 


du, 
Hrs = pal ms x3 


find 4i, ду and д,. Р (P.U., M.A. (Stats.), 1969) 


By definition, the гїһ moment about the mean is 


д, ig oye where 4-1-р 
z-mIQ-npy [Р Э-у-у | » Ia- jg" «Xg- w| "or 


STRETE е (n-)— А 
j p 


=-тд,. + zi- or (op 47” eS 
J Ф 


S 


1 
=-ти,+——Д\. 
dij gun SS 
Maltiplying both sides by pg and transposing, wage 


bs 
bee) 


r= 1, 2 and 3, we get біз 


don tH) qe прд, ( ду =1) 
= pd ann +2) = китке (сд 70) 


= pq [0 + n(Y — 2p)] = pa[nq – np] = пра (q =p), 


4 
РЕ мз + A 


5 Е +£ (np- np^yi - 20) 
p - 


= 3n? pig? * npq(1— 6pq) . 
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8.3 HYPERGEOMETRIC PROBABILITY DISTRIBUTION 
There are many experiments in which the condition of independence is violated and the p 
of success does not remain constant for all trials. Such experiments are called Луретр 
experiments. In other words, a hypergeometric experiment has the following properties: 
i) Тһе outcomes of each trial may be classified into one of two categories, success and 
ii) Тһе probability of success changes on each trial. 
iii) Тһе successive trials are dependent. 
iv) The experiment is repeated a fixed number of times. 


The number of successes, А in a hypergeometric experiment is called а hypergeometric ry. 
probability distribution is called the hypergeometric distribution. When the hypergeometric 
assumes a value x, the hypergeometric p.d. is given by the formula, 


Bees 
Р(Х ea) e SO for x such tat =, 1,2, оа, 
(4 м 

where N =number of units in the set or population, istve integer) 
паше of units in the subset or sample (а раде integer y and 
k= cmc on ate eg na etna 


The hypergeometric p.d. me шак n and k, (or N, n an | po andis 
denoted by h(x; N, n, М, n, k). agis p.d. is appropriate when 
i) — 


ii) kof the units are of one kind (classified as success) and the remaining № — k of 
(classified as failure). 


n is drawn without replacement from a finite population of 


8.3.1 Derivation of Hypergeometric Distribution. Suppose a set contains V е 
which & are classified as success and М — k are classified as failure; the two classes 
exclusive and exhaustive, and we select a subset of n elements (n < М) from the set without 


Then the total number of ways in which a subset of n elements can be chosen from М is Н! 
п 


Let X denote the number of successes and let X = x if and only if we choose x 
successes and л — x failures from N — & failures in the set. Then the number of ways in whi 


and n — x failures can be chosen, is (075). 
xJn-x 
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Hence the probability that x of the n elements are successes, is 


кке 
хДп-х 
Р(Х = x) = >, for x such that 0<х<п and OS x Sk. 


N , 
п s 
is the required formula for the hypergeometric probabilities. 


To provide that the sum of the hypergeometric probabilities is one, we use the following useful 
tical result (or formula). ; 


ЖУКЕ). 


by expanding both sides of 
(а(х) - (xy 


equating the coefficients of x". 


еа) д” 


тй ШОК: жы” 


hypergeometric distribution ee from the fact that probability generating function 
put in the form of a hypergeo ies. 


le 8.12 An urn containsel ted balls and 6 black balls. A sample of 4 balls is selected from 
without replacement, Le the number of red balls contained in the sample, then find the 
distribution for Х. “У % 


X is a hypergeometric r.v. because 
the results of each draw may be classified as either red (success) or black (failure), 
the probability of success changes on each draw, 
the successive draws are dependent as the selection is made without replacement, 
the drawing is repeated a fixed number of times (n — 4). 


‚М=4 + 6-710, k= 4, n = 4 and the possible value of X are 0, 1, 2, 3, and 4. Therefore the 
of these possible outcomes are 


8) as 


P(X =0) = 1(0:10,4,4) = 
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ШУ ж 


P(X =1) = h(510,4,4) = 


10 210 
4 
09 
= 7 3 2/42 .90 
P(X = 2) = h(2;10,4,4) = 10 210 
4 
018 
р, = 3 1 _ 24 
P(X = 3) =h(3;10, 4,4) = 10 210 
4 
(90 Кз 
A б М4 4Д0 = 9 
P(X = 4) = h(410,4,4) == «е 
Я 4 9 
S 


nc of X is as follows: aS? 


Example 8.13 The names E and 5 women are written on slips of paper and р 
Four names are drawn. What is ility that 2 are men and 2 are women? 
Let X denote ће пипїф of men. Then 


N =.5 + 5 = 10 names to be drawn from; 
n = 4, and (here possible values of x are 0, 1, 2, 3, 4, i.e. л) 
к= 5. 


Hence ће hypergeometric distribution is 


0 | 
h(x:10,4,5) - 5 uU 
(" | s, 
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required probability i.e. Р(Х = 2) is 


ala). 
2/12) 10 
h(2;10,4,5) = 
( ) 10) 21 
4 
Example 8,14 What is the probability that a poker hand will contain exactly 2 aces? 
Let us regard the 4 aces as success.and the 48 nonaces as failures. Then we have 
N= 52, n = 5, (number of cards in a poker hand). 
= 4 and x 2. (here possible values of x are 0, 1, 2, 3 and 4, i.e. k) 
Фе probability that a poker hand will contain exactly 2 aces is 


25). 
Р(Х =2)= - h(2:52,5,4) = 2 5 D AA 0.0399 . > 
^ S 
Кы 


Properties of Hypergeometric Distributio Q The important properties of the 
ic probability distribution are given here. we 


7) Тһе mean and variance of the hyperg с probability distribution are H-np and 


E . where p E an q M pin have the hypergeometric probability distribution 
N 
x 
kYN-k Кы 
A ea 


, for x suit 0<х<п and O<x<k. 
i ae 


e) ede] 


5 x(x - D'(k -»] Er А 
Ж п 
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1 "Es ise 
T FOL] 


n 


when x1, y=0, and when x=n, y-n-1] 


LAm(n-1NN-n! (М1) 
= NND! (1- DN -n) 


nk k 
=—=np, where р---. 
Nowa ee 


Thus the mean of the hypergeometric distribution is the "d that of the binomial 
By definition, the variance, c^ is given by KS 
с? = EX - дї! = EQ )- 4^. 9 


Now Қ) а ae 


- ues Nn, 795 N,n,k) 


Let y=x-—2, then 


&X(k-2Y N-k 
K(k-1 
nk ‘ 5; Loss] 


2 =— tt = = 
E(X^) N * (") E [^ whenxz-2,y = 0, and when x 
n 
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E MN) | 
nk п-2) nk kk-.nn-1) 


N (") N N(N-1) 
n . 


с? = Е(Х?)-џ? 


ани) (м) | 


Ne N(N-1) N 
_nk(N-k) N-n 
№ "NC 
N- k N-k 
-npq. , wh =— =——. D 
npq.——— Ne s ere р- ү mda N 


E along the same steps as above, we may obtain higher Ran but they are rather 


Т) If М becomes indefinitely large, the hypergeometric stis distribution tends to the 
probability distribution. 
S 


poe, Then k = Np and N- k = N(1—p) = Ж 


ituting these values іп the hypergeometric gon, we get 
Г 
x \л-х қ» 
М NS 
! lg 


m Кр)! Ма) (М —n)! 
xt(Np — х)!(п- х)!(Ма—-п+ х)! М! 


Cx) (Np =) Ма —п+х)!М! 


Bo N,n,k) - 


itling’s approximation (us en" A2 m) to all factorial terms and simplifying, we get 


N 


n 
zNnk)s () 5 Np-x4/2 ЕБ Ng~n+x+1/2 
Pi a= 


N N. 


Ў N-n4/2 
pieni eai >) 
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Now, if N is allowed to become indefinitely large, then — XA A and = — each aj 


Therefore 
A(x; N,n,k)= C re = b(x;n,p). 
x 


8.4 POISSON DISTRIBUTION 


The Poisson distribution, named after the French mathematician Sime'on Denm 
(1781-1840) who published its derivation in 1837, is used as 


a) a limiting approximation of the binomial distribution b(x; n, p), when p, the 
success is very small but л, the number of trials is so large that the product np = 
moderate size; 


b) а distribution in its own right by considering a Poisson process where events occur 
over a specified interval of time or space or length. Such random events might be 
of deaths by horse-kicks per year; the number of telep 
switchboard; the number of taxicab arrivals at an in 
born blind per year in a large city; the number of typi 
of red blood cells in a specimen of blood; th 
given period; the number of flaws per unit I 
claims made to a company in a given time; 


Generally, most statisticians use Poisson a S imation when p is 0.05 or less and пж 
эзүү а Убар б lori күчән If we 


Бег of radioactive particles 
of some material, the number 


goes to infinity and p approaches zero in way that и = np remains constant, then the 
of the binomial probability distribution x 


Lim b(xin,p)- Е A 0,1,2,..., oc 
ej a 
where e = 2.71828. distribution has only one parameter и >0, and is 
The parameter и may be interpreted as the mean rate of occurrence of events. It is rel 
/ this is a probability distribution as the function is obviously non-negative, ie. р(х; и)>@ 
unity, i.e. 


Dd psu) =) Р(Х =x) 


x40 2-0 
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Poisson probability distribution is also called the Jaw of small numbers or the rare events distribution. 
found wide application in the field of Biology, Physics, Operation Research and Management | 
. The Poisson distribution is арргоргійїе when the number of possible occurrence is very large 
number of actual occurrence is very small in a fixed period of time. 


84.1 Derivation of Poisson Approximation to the Binomial. To derive an approximation 
to the binomial distribution b(x; n, p) when n — œ, p > 0, and the product mp remains constant, 
d as below: 


The binomial distribution b(x; n, p) may be written as 
^ 
b(x;n, p-Cre. forx —0, 1, ..., n 
x 


= -2).(n-x*l) x ‘ans 
жезде 2с, 


пр =и.Тһеп р= 2 and й=1—р=1-=. S 


all terms involving р, we get 


nin - Dti - 2).. sew “yey” 


eu 
S 


„=. пбп-10)(п- > (n= NM id (1-4) 
ТЕСЕ) 


n=% and р->0 so that лр = ш remains constant, we observe that each of the terms 


x! n 


= x and (i-4) approaches unity. The term (1-2) may be written as 
= л п 


ЖЕРІ 


ol 
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A k 
Ifn increases indefinitely, so does k. Therefore ( -1) tends to е, where e =2.71828. 


n 
: ші -4) =e” 
na! n 


Thus the limiting value of P(X— x) i is given by the expression 


x 
Lim b(x;in,p) = Лл. Ле = ee forx=0, 1, ..., о 
ima x! x 


In other words, if X is a binomial r.v. such that 


Р(Х =х)= u pe. den 


e" 

Lim P(X =х)= < -0,1,2,..., © 

Lim P(x 232 = f КЫ 

пр=р . 9 

It is denoted by p(x; ш). Hence a г.у. X having pd и) is said to have a Poisson 
with parameter и. © 2 

Example 8.15 If X is a Poisson random й: with parameter и =2, find the pro! 
x 7 0, 1, 2, 3 or more. by (P.U,, B. 


Heré the Poisson distribution i is ” 
е? 
px22f QN" —(х=о,1,2,..) 


The desired probabilifj Ew 0, 1, 2, 3 or more are computed as below: 
P(X=0) = р(0 YY = 0.135335 


72. x 
P(X =1)= p(;2) - - = 2(0.135335) = 0.27067 


P(X =2) = р(2;2)=©- m 1 Med (0.135335) =027067, 
P(X 23) 21- P(X <3) 
-1- [P(x-0) + P(x-1) + P(X-2)] 
+ =1 = (0.135335 + 0.27067 + 0.27067] 


= 1 — 0.676675 = 0.353325 
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Example 8.16 Two hundred passengers have made reservations for an airplane flight. If the 
ility that a passenger who has a reservation will not show up is 0.01, what is the probability that 
у three will not show up? 


Let us regard a “по show" as success. Then this is essentially a binomial experiment with n = 200 
р = 0.01. Since p is very small and л is considerably large, we shall apply the Poisson distribution, 
и = (200) (0.01) = 2. 


Therefore, if X represents the number of successes (not showing ир), we һауе 


P(X =3) = р(3;2) Gre 


= 8X0.1353) сте 0.1353 
3х2х1 (2.71828) 
7 0.1804. t 
Example 8.17 The probability that-a man aged 50 years will die ә іп a year is 0.01125. What is 
bility that of 12 such men at least 11 will reach their fifty-fi y? 


Here p = 0.01125 and л = 12. We compute the desire probalilsty by means of Poisson distribution 
the probability of death is very small. S 


Therefore и = np = 12 x (0.01125) = 0.135, and tte glos distribution is 


- 


р(х;0.135) = 


e 2955 (0:135)* by 
x! % 


Now the probability that no person маб, i.e. all the 12 persons will survive, is 
р(0;0135) =e PV 


AN ^ 
«150135» «ыы ah bed 
rede MES 
08737, 
probability that 1 person will die, i.e. 11 persons will survive, is 


Е e 9955 (0,135)! 
pü03$) == 


= (0.8737) (0.135) = 0.1179 
Фе probability that at least 11 persons will survive 

= p(0; 0.135) + p(1; 0.135) 
= 0,8737 + 0.1179 = 0.9916. 


Example 8.18 А sampling plan calls for taking 100 items from a very large lot which is one 
defective. Let X be the number of defectives found in the sample of 100. Construct a table of 
ies P(X=x), for x = 0, 1, 2, 3, 4, using first the binomial probability distribution and then the 


- 
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Let p denote the probability that an item is defective. Then the binomial probabilities with ж 
p — 0.01 and q = 0.99 are 


Р(Х =x) 4 Шы х= 0, 1, 2, 3 and 4. 
x = 
These probabilities by means of the Poisson approximation, using и = np = (100) (0.01) = 
let 
Р(Х = x)= р(х;\)=———, x=0, 1, 2, 3 and 4. 
x! 
The desired probabilities, on simplification, are given in the following table: 


: = 


These results indicate that the approximation is very, 


8.4.2 Poisson Frequency Distribution. 9 Poisson distribution is multiplied 
number of sets of experiments, each of л trial: resulting distribution is known as the 
frequency distribution, and is "a by ы” 


о) м. arm 1,2... 


Example 8.19 For a machii parts, there is a small probability of 0.002 for a 

defective. The parts are supplied] bundles y 10. Calculate approximately the number of 

on os no defective, one ive or two defectives in a consignment of 10,000 bundles, 
= 0.9802. 


Let p be the probability of part being defective. Then р = 0.002 and л = 

Since р is extremely small, we apply the Poisson approximation, using и = np = 10 x .002 
Hence the approximate number of bundles containing no defective, one defective or two deft 
terms for x = 0, 1, and 2 in 


x 002 
М.р(худ) = 10099. LE 


Putting x = 0, we get 
10000 x е 22 = 10000 x 0.9802 = 9,802. 

Puttingx -1,weget | : 
10000 х 2722 (0.02) = 10000 x 0.9802 x 0.02 = 196. 
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Putting x = 2, we obtain 
20.02 2 2 
e (0.02) _ 10000x 0.9802 x (0.02) _ 
2! 2 


8.4.3 Properties of the Poisson Distribution. Some of the main properties of the Poisson 
ion are given below: ‘ 


10000. 2 approx. 


1. If the random variable X has a Poisson distribution with parameter и, then its mean and 
variance are given by E(X) = 4 and Var(X) = ш. 

By definition, 

Mean = E(X) 


\ 


= -i „х 
= хруд), where рау) = —Е 
х=0 T 
2 3 
=0.e% +1.де“ P RES aa £13, e & 
2! 3.9 
X 
2 из Q) 
тие К e. PM 
AS 
шие” e" =y by 
» 
S 
9 
еен 
E E х 
e 2 
A: 
fox - 2! 
x-1 
= pe? T 5i (since the first term in the summation being zero is omitted) 
xal 24 
-1, Шеп 
E(X)- де” У = (»-0,1,2,... о) 
у=0 4^7 
= ре еї =p 


4: , the parameter of the distribution. 
Var(X) = ЕХ?) — [EQOY,, where 


au__https://stat9943.blogsaheoi тогтлын 
ЕЖ) = EMR-I) + А] = EQ0 + EXX- D) 


УС. м. Sx се шіт ту E 
х=0 r 

SEE SEIS Есі. 
ute Узе DG -DG-2 


© х-2 j 
е (x starts at 2, as the first two terms in the 
Qe ce ^ 


Let y 7x-2,then 
m y 
БА) apt wey (у=0;1,2,....) 
4 ам 
2,-H 47H 2 
=и+ше е = +p 
SS 
Hence Var(X) = ue uà =p? = и ev 
We observe that the Poisson distribution has an i iter that its mean equals its van 


2. Higher moments of the distributior,are Бы as below: 


bs 
By definition, д} = E(X’) BP nen 
К 


Writing x’ in the factorial ҚҚ, Ге 
> -х%(00-2)%34-1)<, we have 


MORE Dod en 
ш = 2 1)(х-2) + 3x(x e. 
«pee нт? екуи 
e. Lad hal e^ у е yr 
ж”. енені еу 
ғ» ЖАТ 


таз «34? +, and 


щ =Е(Х*у= S posu) * 


х=0 
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x* in the factorial form as 
x* = x(x-1) (x-2) (х-3) + 6x(x-1) (x-2) + 7x(x-1) + x. we get 
ДА = ey sx -)(х-2)-бх(х-|)(х-2)%7х(х- Dex 


xe 


we 2 “uk и? - ut 
ám ^ e Yo Lam 


Om sou е", 
= pt «60 70! th. 
т =щ-3дй+2(д)* 
= (P +3? + п) -Зи(и? + и) + 246 = и; 
My = My -Apiu + 6p (4) —3(щ)*, 
М 


=(и* +би? +73? + ду-4д(д? 346 но+ oci 


2345 +p. © 
а S 
wt ule 2: е 
A шт” we ae Ne) 
D | 
А, 24A 3H +H 4, o 
ten all NN 


3 The shape of the Poi Vistribution depends оп the value of its parameter u. As the 
distribution takes o: infinite number of x values (theoretically), the distribution will be 
positively skewed. Thé distribution tends to be symmetrical as ш becomes larger and larger. 


Reproductive Property. If two independent r.v.'s X and Y have Poisson distributions with 
parameter 4 and v, then their sum X^ Y has also a Poisson distribution with parameter и +v. 


Here X is р(х, ш) and Y is p(y; v); and we desire to find P(Y+Y=k) for k — 0, 1,2, .... 
=0, P(X+Y=0) = Р(Х=0) Р(Ү=0) (7. X and Y are independent) 
=e esp nm 
L P(X-Y-1) = Р(Х=0). Р(Ү-1) + P(X-1). P(Y-0) 
ze" ve’ + ue" e^" 


ze" Guy) 
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e Um 2 
жу). 
я (шу) 


Similarly for k= 2, Р(Х+Ү=2).= 


And, in general, for X + Y =k when X = i and У=Ё—ї for i = 0, 1, 2, ..., k, we have 


P(X+Y=k) = Р(Х-0) P(Y=k)+P(X=1).P(Y=k-1)+ ... P(X-K).P(Y-0) 


* kv п ^v 


=H s “u 
аар с EARE Ын. 
о А П (5-1)! kt о 


k 4-1 * 
нр Me a, at 
: P A 


Multiplying the r.h.s. by = „ме get 


k! 
ETU е Ж "T а” S 
P(X+Y=k) = ii 9 + 1 Hu. É e 
(atv) + © 
== [^ (uv, 9 
which is a Poisson distribution with , This result can easily be generalized. 
The converse of this result, that is, Y are independent, and X^ Y has a Poisson 


then each of the r.v.'s X and Y has a Рофу®п distribution, is also true was proved by Raikov. 


8.44 The Recurrence for the Poisson Distribution. Let Х be a Poisson 
variable with parameter и. 


ҳу х | 
Р(Х A ка; апа 


х-1 
=х—1)=е`^ 7 
P(X x 1) е {х-1] 


Р(Х =х) ен” (х-І и 


Therefore Р(Х =x)=4.P(X¥ =x-1) for x = 1, 2, 3, ... is the recurrence formula for 
x 
distribution. Using this recurrence relationship, the Poisson probabilities can be obtained more 


https://stat9943.blogspot.com 


TE PROBABILITY DISTRIBUTIONS 317 


84.5 Fitting a Poisson Distribution to Observed Data. The Poisson distribution can be fitted 
derived) if we know the value of its mean which is usually obtained by equating и to the mean 


te from the observed frequency distribution, provided that the probability of occurrence is very 
. Using this value of mean, the expected or theoretical frequencies are computed. The following 
ical example will illustrate the procedure. 


Example 8.20 Bortkiewicz (1868-1931) collected data on the number of deaths from horse-kicks 
ian Army Corps over a period of 20 years. This distribution of deaths was as follows 


No. of deaths * |. 0 1 VA A 3 4 5 | Total | 


Fit a Poisson distribution to these data and compute the theoretical frequencies. 


To fit a Poisson distribution, we need to compute (estimate) value of mean of the given 
ion and equate it tó и , the mean of the Poisson distribution c © 
X 


Now. gel оа О 


; 200 
Л E 
= 322 = 0.61, which is an езі ROSE uL 


the fitted Poisson distribution is given) 


> 
-0.6 ex» 
P(X a= 06) CNET. where x = 0, 1, 2, 3, ... 


The theoretical or S frequencies of x deaths are computed by multiplying the probabilities 
ch is 200 here. The probabilities are computed by using the Poisson recurrence formula, which 
form 


0.61 ; 
POCex)s S POCex-1), forx - 0, 1, 2,3,4, 5. 
сазе the table values for e™” are not available, they are computed by use of logarithms as 


Let yes 


Then logy =-0.61 loge = (-0.61) (0.4343) 


= 0.2649 = 1.7351 so that y = 0.5434. 


= 0.5434 


eit ү 961 (0,5434) -03315 


ев ew. А S5 03315) -01011 


лы шу. 5 X 1011) = 0.0206 


| Желе: £e ES (0.0206) = 0.0031 


а COD (60 _ 


8.4.6 Poisson Process. In an earlier section, the 
approximation to the binomial distribution. The terms о, 
a Poisson process which may be defined as a physi 
mechanism. The occurrence of traffic deaths 
Poisson process has the following properties: ы 


i) The probability that an event oP in a very short time interval A, is 
length of the time interval, i@Ns approximately АЛ , which A is a positive quantity 
be interpreted as the a уе шег of occurrences per unit of time. 

ii) Тһе ane that M more events occur in such a short interval is so small that 

iii) Events occur non-overlapping intervals of time are statistically independent. 

These properties are assumed to hold for events occurring randomly in regions of space. 

these properties, it can be shown that the probability for the number of occurrences of a г 
an interval of stated length ¢ is given by the Poisson distribution with the parameter 41. 
Poisson process formula is 


oisson distribution can also be der: 
ess governed at least in part by some 
ith in a city is an example of a Poisson 


e^ 1 
At 
р(х; А) =£ c i , * 
where t= number ot; units of time, 


x = number of occurrences іп f units of time, and 
А = average number of occurrences per unit of time. 
The derivation of the Poisson process formula is beyond the scope of this book. 
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Example 8.21 Telephone calls are being placed through a certain exchange at random times on the 


of four per minute. Assuming a Poisson process, determine the probability that in à 15-second 
there are 3 or more calls. (P.U., В.А./В.5с. 1979) 


Taking a minute as the unit of time, we have . 
А —4 calls per minute 
= 4 calls per 60-seconds. 
s 15 second is = - units of time, so t=} and therefore the average number of calls per 


\ 


interval Le. At = 4x i ESI Y 


. e" ( М) 
е, using the Poisson process formula pos A) -— ——, we have P(3 or more calls in 
6 x! 

interval). 

-1- Р(0, 1 or 2 calls іп 15-second interval) S 

e. 
2 2 ol үүх 
2 Ы e () Xs 
=1- ріс) =1- 0—1 N 


х-0 х-0 > S 
2 x 9 
= 1 SEU E е! = 0.3679) 


t 
х-0 x: 


2 1 o^ 
= 1 — (0.91975) = 0.08025 ©) 


le 8.22 Flaws іп а certai Mee of drapery material appear on the average of one in 150 
If we assume the Poisson distribution, find the probability of at most one flaw in 225 square 
ж 
150 square feet "mm of area, we have 
А = 1 flaw per 150 square feet. 


225 square feet are 23.15 units of arca, so £ = 1.5 and therefore the average number of ` 
225 square feet, ie. Аг= 1 x 1.5 = 1.5. Assuming the flaws аге a Poisson process, we have 


1 
most one-flaw in 225 square feet) = (у р(х; М) ` 


х=0 


1 ў -1.5 х ; 
= У Oy where 125-0225 
xed x! 


= 0.2231 + 0.3347 = 0.5578 
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8.4.7 Moment Generating and Cumulant Generating Functions of the Poisson D: 
The m.g.f. for the Poisson distribution with respect to the origin, is found as below: 


Mot) = E(e") - Y e" р(х; и) 
х=0 
- ж x we? x м LJ (шеу 
ze Er ыы 22 х 
= eH еме ae) 
The mean and variance are obtained as: 
у. — Jal дез) 
д ga]. = feri [| 
= [veo] a $ 
d S 
ad ж = кое) fone | ev 
dt 
ғо % 
By ee) ad 
чер 


= prp әр 


N 
с? = apo - 


The cumulant рех пр function (c.g.f) of the Poisson distribution with respect to ori 
by 


x(t) log. МАЙ 


= p (4-1) 


к, = co-efficient of — in k(t) = y, for all r, 


Hence all the cumulants of the Poisson distribution are equal to и. 


* 
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le 8.23 Prove the following recurrence formula for a Poisson distribution p(x; m): 


а 
Hry = TMM, тей 


и, = = ¥ plxim)(x- т)' =o а-ту 


xo) «0 


ig with respect to т, we get 


ж "m — € — 
3 = (х-т)” x= 5-5 а-ту-5 e о-в] 


rol 


both sides by m and simplifying, we get 


z—rmu, PEE PT m) .(x—m) 
= "ти. +, ) e 
e 
= туи, itr : Xe 
r-l o 
dm 
E 
TIVE BINOMIAL DISTRIBUTION 45% 


binomial experiments, the number of succe: Se and the number of trials is fixed. But 
ments in which the number of success Med and the number of trials varies to produce 
r of successes. Such experiments gp эы negative binomial experiments. In other 
ve binomial experiment possesses Ilowirig four properties: 


outcomes of each trial may be йлеа into one of two categories: success (5) and failure 
. 
. 


probability of success, qa by p, remains constant for all trials. 
successive trials are all'independent. 
experiment is repeated a variable number of times to obtain a fixed number of successes. 


denotes the number of trials to produce k successes in a negative binomial experiment, it is 

binomial variable, and its p.d., is called the negative binomial distribution. When the 

r.v. X assumes a value x, on which the Ath success occurs, the negative binomial 
given by 


kai 
AX=x)=| |р, r=kk+1,k+2.... 
kau. 


ial p.d. has two parameters. k and р>0, and is generally denoted by b*(x; k, p). To 
the negative binomial distribution, the probability function sums to one, we proceed as 


58 22 ре xi (xk k+1,k+2,...) 
х=} 


* 
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Let y7x-k Then 
k 
Sum of prob -M p >, (=0,1,2,...) 
Д\ k-1 
- 3 oe i 
os 
25. k-1 2, k Ден TM 
ГЕСЕ 1 
= peas. Xy 4 
= p'(1-g]* = p* p* =1. ИХ 


The distribution takes its name from the fact that рч” -+ іа term in the 
p А014)“, a binomial with negative index. Thus the р йез at kth, (+1) th, (k+2)th, .. 


p [ias S629... S 


This distribution is sometimes also calle Be Pascal distribution, after the French mathe 
Pascal (1623-1662). The distributiong@g’found to occur in-many biological situations 
sampling from a binomial Populations 
8.5.1 Derivation of the tive Binomial Distribution. The negative binomial di 

be derived in various ways. ollowing derivation is based on the Bernoulli trials. 
- To find an exp for the probability that x trials are made to achieve k su 
condition that the last must be success, we proceed as follows: 


A sequence containing К successes in exactly x independent trials with the condition 

а success, can be obtained as 

M 4: $$..$ FF..FS. 
aite bari 
k-ltimes x — Е times 

"Thus the probability of a success on the xth trial preceded by (#—1) successes and (x 

p"q*p-pq^* 
Since the last trial must be a success, therefore the total number of mutually е 
which (А-1) successes and (x-k) failures preceding the last success can occur in amy order; 


(53) 
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Hence the formula for the probability that kth success occurs on the xth trial is 
Р(Х = х)= mis p'q** , Where x =k, k+l, k 2, ... 
k-1 - 
The negative binomial distribution can also be obtained when two or more Poisson r.v.'s are 


term by term. 


Example 8.24 A person throws a pair of fair dice. What is the probability that he will get a total 
the second time on the eighth throw? 


The probability that he gets a total of 7 is x ie p= i 3 


се the number of successes is fixed, therefore the negative binomial distribution with k = 2 
seccess) and x = 8 is used. 


oe 
27-46 Xe 
а "(5 B - 0.0651 E 


ple 8.25 Three people each toss a coin and 4954 тап pays for the coffee. If the coins all 


or all show tails, they are tossed again. кәл the probability that a decision is reached in 
er fewer? о 


seach а decision on апу trial, the coi S result in either “2 heads and 1 tail' or '2 tails and 1 
probability of these events is E by the binomial distribution because 


each coin has two possible теуді a head or a tail, 
the probability of кн is p -1 and remains the same for each coin. 


Sree people toss the com independently. 


"егес coins are tossed in each case. (i.e, m= 3) 


> Зузу 
P(2 heads and 1 tail) | 13) (ҒЬ 


3 

8 

een) Y es 

Р(1 bead and? i) = |2020) 31% 


bility of reaching a decision is iu -£ =0.75 


> 


her 
ж= find the probability that a decision is reached in 5 tosses or fewer. We observe that 


h or do not reach a decision with each set, 


bility of reaching a decision is p = 0.75 and remains the same for all sets, 
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iii) the results of all sets are ihdependent, 
iv) З variable number of sets is required to produce 1 decision 


We therefore compute the required probability by means of the negative binomial dis 
where 
Е к= 1, р= 0.75 and x < 5 (the number of trials). 
Тыз Р(Х<5)- >а 10 75) (0.25)9%,і-1 
х=] 
= 0.75 + 0.1875 + 0.0469 + 0.0117 + 0.0029 
= 0.9990 ' 


8.5.2 Properties of the Negative Binomial Distribution. Тһе important properti 
negative binomial distribution are given below: 


1. The mean of the negative binomial distribution is less tha! variance, 
We find the mean and variance by deriving ће m.g.f- е 


` The m.g.f. about the origin is 
муф = Ee") S 
EENE X tk-1) . NS 
з КЎ 


ж Men А шай 
N „Mean 29 i 


dt j 
= ПЕД (1 „деу; 
=p" . ®щ(1-4у*' 
= kp pt = 4. 

p 
and * o° =E(X*)-[E(x)]’, where 
d’M (t) 

Ex) om 
XE 


- "|[p'kge(1-gd +p kk 1g е (1-g8) h-o 
= pka- q) кіні p (1-9) 7 
E N DIA 
р р? 


жо. н, (ыу 


2 


р р р 
_ дар+ 4° *kq'-Kq' kq(p*q) № ` 
2 = сы aT. 
P P р 
variance will be greater than mean, if 
A. or TE MU 
p p р р 


if 12р, which is obviously true. 


Hence we observe that the variance of the negative binomial distribution is greater than its mean. 
= an important feature of this distribution. 


2 The negative binomial distribution is always positively skewed. 
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genas experiment consists of independent trials with probabifity p of success and the trials are 
until the first success occurs, it is called a geometric periment. In other words, a geometric 
has the following four properties: | i 


—( 
The outcomes of each trial may be classified ne of two categories, success and failure. 


The probability of success p remains <%% for all trials. 
The successive trials are all independ. 
The experiment is repeated a е number of times until the first success is obtained. 


e 
Е X represents the number of, Ї& needed for the first success, then X is called a geometric г.у. and 
3s called the geometric ution. It has only one parameter p and is denoted by g(x; p). The 
distribution derives4% name from the fact that its successive terms constitute à geometric 
3on. Since a geometric ғ.у. represents how long one has to wait for a success, it is also called a 
time r.y. It is interesting to note that a geometric distribution is a special case of a negative 
distribution when k = 1. 


6.1 Derivation of the Geometric Distribution. Let the random variable X denote the number 

required upto and including the first success of an event. Then X takes the values 1,2,3...,0. 

=x if and only if the first (х-1) trials result in failures and the xth trial yields a success in the 
sequence of trials, we therefore have the probability distribution of X, as 


Р(Х=х) = фр, х-1,2,3,.. 9. 
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This is obviously a probability distribution as (i) Р(Х-х) 2 0 and (ii) the sum of the pi 
unity, i.e. 


УР(Х=х) p*ap*dp*qp*- 
x-l 
-p[l*q*q*q*.] 
=p [l-g] 7 pp! 71. 
Example 8.26 If the probability that a person will believe a rumour about the re! 
certain politician is 0.25, what is the probability that 
3) the sixth person to hear the rumour will be first to believe it; 
b) the twelfth person to hear the rumour will be the fourth to believe it? 


Тег Х denote the number of a person who hears the rumour. Then the number of a pe 
believes it, will be considered a success. 


a) Since the sixth person is the first to believe the rumour, i.e Ape first success occurs on 
trial, therefore the gedtmetric distribution with p = 0.25 7 6 is appropriate. 
Hence, using the geometric distribution Р(Х-х) pg", we гең.” 
P(X=6) = (0.25) (0.75) = 0.059. 


b) Since the twelfth person hearing the meg be the fourth to believe implies that 
success occurs on the 12^ trial, therefegey е negative binomial distribution with ред 
and x12 is appropriate. ^» . 


Hence, using the negative binomial distri 


b*(x; kp) = (Ға 


Б 14 
А ‚ме ge 


braza, 0.25652) 127 |o25) 60.752 
X 4-1 


“У M ЕСІР! 2 165x6561 
4 13Д4/(4) 26565536 
м = 0.0645. 
8.6.2 Properties of the Geometric Distribution: 
1. The mean and variance of the geometric distribution аге д = 17 p and o? =q / p> 
Let the r.v. X have a geometric distribution g(x; p) = pg*'. Then 
и = E(X) = Ух. gix; р) 


-Хх.4ір.мйегех-1,2,3,.... o0 


РВОВАВШТҮ Ptas stat9943.blogspot.com P 
-p*2qp*3qp*4qp* .. 
=р[1+ 2g +34 +44 + ..] 


= pl - g]? =p. p? =+; and 
p 
ce? = E(X?) -[E(X)F. , where 
Е(Х) = EEA P 
xl 
=p + Pap + 3g'p + qp + ... 
=p [1* 49 +94 +164 + ...] 
=р[(1+34+64+104'+...) + (q+3q°+6q'+...)] 
=p [(1-4)° + 4(1- 4)?] 


&stribution is positively skewed. у" 
t Generating Function of th metric Distribution. The m.gf of the geometric 
ved as below: K 
ж Ne 
м4) = E(e“)= Ye; p 
қ XA 
ҳу 
© ду 
ре Y (e'g) 


xal 


Ш 


ре [1+qe! & (ge ? +...] 


u 
= ре! [1- ge * RE where qe « 1. 
e 


tiate the m.g.f., we write it as 


Mot) » —L— = ре" - ay* 
Sime. | 
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Tis — M;(t)- ре" (е —q) ^. and 
Mg(t)=2pe™ (e* -q)* - pe*(e* –4)?. 
Hence Е(Х) = pd-9 e, 


E(X) = 2p(1- q^ - p(1- 4)? 


= 2-4 
р 


BX) -EWF 


а 
\ 


D 
БЯ өз 

I 
*s|- 
AL 
-з|- 
= 

n 
xs 


8.7 MULTINOMIAL DISTRIBUTION 


A binomial experiment becomes a multinomial ехрегіт hen there are more than 

` outcomes of each trial. For example, manufactured items classified as good, average, 
or a road accident may result in no injury, minor Басе» еге injuries, or fatal injuries. А 
experiment has the following properties: 


i The — of each trial may be еі into one of k mutually exclusive 
Сұ 


ii) Тһе KR of the ith oute “А which remains constant and Ур, =1. 


ii) Тһе successive trials are инеем. 
іу) Тһе experiment is терд a fixed number of times, say, n. 
e . 

8.7.1 Derivation of Multinomial Distribution, If л independent trials be 
specified order in мүйісі occurs x, times, С; occurs x; times, ..., C, occurs x, 
xytx?*...*xy7n, is 

с.с, OS Cy. Cy 
АБЕ eS 
X; times x, times x, times 


The probability of this happening by multiplicative law is 


^ 


Жер Disi e РАР „Р pi^ p^ pi^ 
PP 


n Dye 1; Pap чу x, times 


But we are ийеле іп events occurring in any order. Therefore the total number 
сескене orders іп which this can happen, is 


n A n! 
Xi X3. XE x; 355 1,..., x, ! 
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Hence the required multinomial probability is 
P(X 8x1, Xz = x2. X7 x1) 
т 


= (уй GS. (pu) 
x 1x b... 


= Ap)" , where SS S ESI 


E x)! m i=l 


The du distribution takes its name from the fact that the above probabilities correspond to 
in the nfultinomial expansion of (p; + р; +... + р)". The parameters of this distribution are л, + 


b s Pe \ 
The mean and variance of the multinomial distribution are 
Е(Х)) = np; and Var(X) = npqy 
When К = 2, the multinomial distribution reduces to the binomi: bability distribution. Thus 
distribution is a special case of the multinomial distribution. сə 
Example 8.27 A box contains 5 red, 4 white and 3 blue qa А sample of six marbles is drawn 


cement, i.e., each marble is replaced before the next awn. Find the probability that out 
les selected, 3 are red, 2 are white and one is blue. 


Let X;, X» and X, denote the red, white and blue qae Then 


5 ° 
p, = Р(Х, ЭР5 ор» 
р›= P(X; =2)=— КУ 
R 
Ps = Р(Х, =) = 
56 ay зү 625 
у EG) 5 ( 2) 7% ~ $184 
EXERCISES 
VE 
Answer ‘True’ or ‘False’. If the statement is not true then replace the underlined words with words 
Šat make the statement true: 


A discrete probability distribution is graphically represented by a curve. 


А binomial experiment always has three or more possible outcomes to each trial 
Discrete random variables may assume any values. 


In a binomial experiment, the individual trials are.independent of each other. 
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у) 
уі) 
vii) 


viii) 


ix) 
х) 
хі) 
xii) 
xiii) 


xiv) 


xv) 


b) MULTIPLE CHOICE QUESTIONS 45% 


1) 


ii) 


ш) 
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Ina binomial distribution the mean is equal to its variance. 
In a hypergeometric probability distribution trials are dependent. 


The mean and variance of the Poison distribution are not equal. 


The Poisson distribution is defined as a limiting approximation of the binomial di 
when p is large, but n is small so that p = np is of moderate size. 


Another name for Poisson probability distribution is the rare events distribution. 
Poisson probability distribution has two parameters. 

In a negative binomial distribution, the successive trials are all dependent. 

The mean of the negative бш distribution is greater than its variance. 

The geometric probability distribution is symmetrical. d 


А binomial distribution is a special case of a inomial distribution when each trial 
assume two possible outcomes. S 


The outcomes of each trial for binomial dis on- may be classified into опе 
mutually exclusive categories. Q . 
| 


Which of the following is not a р бету of a binomial experiment? 


a) The successive trials are dependent. 

b) The experiment is a fixed number of times say n. 

c) The probability of, denoted by p, remains constant for all trials, 
d) There are three os re possible outcomes for each trial. 


The standard S n of the binomial distribution is: 


а) пр 


For a binomial distribution, the mean and variance are related by: 
a) u< o? 
b) w=o 
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For a binomial distribution P(X = x)='2C, (0.5) (0.5) x — 0, 1, 2; ...., 12, the mean is 


For a Poisson distribution, the mean and variance are' related by: 


a) uo 
b) ucc? 
c) uc 


d) None of above 


А binomial distribution may be approximated by a Poisson distribution when 


а) nislarge and pis small S 

b) nis small and p is large e 

c) nis small and p is small ом 

d) nis large and p is large БМ 

Which of the following is not a property of a eometric experiment? 


а) The probability of success changes on sh trial. 

b) The successive trials are indepe: 

с) The experiment is repeated a fi umber of times. 

d) The outcomes or each чему be classified into one of two categories, success and 
failure. e 


For a negative binomigpg bution, the mean and variance are related by: 


» 


a) j= 
b) uso 
c) ure 


d) None of above 


Which of the following is not a property of a multinomial experiment? 

a) The successive trials are all independent. 

b) The experiment is repeated a fixed number of times, . 

с) The outcomes of each trial may be classified into one of k categories (К > 2). 
d) The probability of success changes on each trial. 


X AN, 
owe ' 
5. shtips 7/51819943. blogspot.com. STATISTICAL 
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SUBJECTIVE 
8. а) What is a binomial experiment and what are its properties? 
b) Derive the binomial distribution and find its mean and variance | 
< (P.U., В.А 
82 а) The probability of an event occurring оп any one occasion is p. Prove that the 
its occurring on exactly x of л occasions is (re. where @=1-р. 
x 
(P.U., B.A./B.Sc 

b) Let X have а binomial distribution with — 7-3 and p= 

A Х = >| P(X =2), POX <2), P(X =-2) and P(X 22) 
(B.ZU., BAJ 
/83 s) Adie is rolled five times'and a 5 or 6 is considered a success. Find the probability 
success, (ii) at least 2 successes, (iii) at least one but re than 3 successes. 
(B.Z.U., B 

b) Using the binomial distribution, find the probabig dt 
i) 3successes in 8 trials when p = 0.4, SS 
ш 2 failures in 6 trials when p" 0.6, 2 
її) 2 or fewer successes in 9 trial р-04. (P.U., М.А. 

“54 a) Епа the probability of gettin: асау 4 heads and (ii) not more than 4 
coins are tossed. S 
b)  Findthe probability of (i S) more heads, (ii) fewer than 4 heads іп а single toss 
' coins. e 
85 а) Ifthe probability М caught copying someone else's exam is 0.2, find ће 
of not getting in 3 attempts. Assume independence, 

b) 160% of. Woters in a large district prefer candidate A, what is the probal 
sample of 12'voters exactly 7 will prefer 4? 

с) Тһе probability that a patient recovers from a delicate heart operation is 0.9. 
probability that exactly five of the next 7 patients having this operation survive? 

8.6 а) Тһе incidence of occupational disease in an industry is such that the workmen 
chance of suffering from it. What is the probability that out of 6 workmen (i) 
2, and (ii) 4 or more will catch the disease? (P.U. В. 

b) Ifon the average rain falls on twelve days in every thirty, find the probability 
three days of a given week will be fine and the remaining wet, (ii) rain will fall 
days of a given week. 

87 An insurance salesman sells policies to 5 men, all of identical age and in good health. А 


the actuarial tables, the probability that a man of this particular age will be alive 30 
2/3. Find the probability that in 30 years (i) all men, (ii) at least 3 men. (iii) only two 
most one man will be alive 
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A multiple-choice quiz has 15 questions, each with 4 possible answers of which only-1 is the 
correct answer. What is the probability that sheer guess work yields from 5 to 10 correct 
answers? 


А commuter drives to work each morning. The route she takes each day includes ten 
stoplights. Assume the probability each stoplight is red when she gets to it, 1s 0.2 and that 
these stoplights (trials) are independent. What is the distribution for X, the number of times 
she must stop for a red light on her way to work? Evaluate P(X=0) and P(X s 5). 


Find the successive terms of the binomial frequency distribution 600 (0.3 + 0.7)°. 


Five dice are tossed 96 times. Find the expected frequencies when throwing of a 4, 5 or 6 is 
“regarded as a success. " 


А perfect cubic die is thrown a large number of times in sets of 8. The occurrence of a 5 or a 
6 is called a success. In what proportion of the sets would you expect 3 successes? 


An irregular six-faced die is thrown and the expectation that in 10 throws it will give five 
even numbers is twice the expectation that it will give four evepspumbers. How many times, 
10,000 sets of 10 throws would you expect it to give no evi er? 

e (P.U., B.A./B.Sc. 1975) 


See om te doncc sd 2 ЫЙ qs Cai e rani 


тез. Write down the theoretical frequencies of 0, 1, 2 4 sixes. Calculate the mean number 
ef sixes in a single throw. S 


©) Find the mean and standard deviation of YAomial distribution (q *py. 
Find the mean and variance of the bi (9+р)". (P.U., B.A./B.Sc. 1962, 69) 
A r.v. X is binomially жее але, mean 3 and variance 2, compute P(X=7). 


In a binomial distribution, d and the standard deviation were found to be 36 and 4.8 
respectively. Find p and ос (P.U., В.А./В.5с. (Hons), 1966) 


А random variable майлы, distributed with mean 12.38 and variance 8.64. Find л and 
P- ҳу (LU., М.А. Econ., 1989) 


Is it possible to have a binomial distribution with mean = 5 and s.d. = 3? 
(P.U., B.A./B.Sc. (Hons.) 1969) 


in the first four moments of a binomial distribution. _ 


; that in a binomial distribution where p is the probability of success, the moments about the 
are given Бун, —npg; p, -пра(4-р) и, = пра1+ 3(n – 2)рд). (РЛ), D. St. 1962) 


к that for the binomial distribution Р(х; n, p) 
- 2 - 

Ё = @-—2р) nd 8,-3 21-604 
пра npq 


Let X be a random variable having a binomial distribution ылы амы n=25 and p=0.2. 
Evaluate PLY <и-2с]. 


8.18 


8.19 


8.20 


8.21 


8.22 


Q5 8.23 
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(10 х А 
b) Given binomial probability function РОХ) | m ( 1) Find the 
\х AZ 
mode of the distribution. (PU, BAB 


A biased coin is tossed 4 times and the number of heads noted. The experiment ts ре 
times in all. The results obtained are shown in the table 


кт “т. жалады. | 
No. of heads 0 i 2 3 4 | 
į Frequency, i2 50 151 2 22 87 
a) Find the probability of obtaining a head when the com is tossed 


b) - Calculate the theoretical frequencies of 0. 1. 2. 3. 4 heads, using the associated 
binomial distribution 


The incidence of defective items in 200 samples of 6 is shown in the following table: 


No. of defectives 
per sample 


Assuming these results follow a Mun compute the theoretical 
probabilities and frequencies. 


ҳу 
Following data give nurnber of questions correctly answered out of 10 questions for 
questions. 


0 1 2 3 4 5 6 71-78 9 10 


[ж 
ТЕТІ Ж EN ЭТТ Сағ 897% 73 


Examine whether the distribution is binomial and find its mean and standard deviation. 


Show that, if two symmetrical binomial distributions (p = q= М) of degree л (and 

number of observations) are so superposed, the rth term of the one coincides with the ( 

of the other, the distribution formed by adding superposed terms is symmetrical bin 

(n1). 

a) If X has a binomial distribution d(x; n, p), then show that E(X) = np. Var(X) = 
M) = (а+реу. 

b) Ifthe m.g.f. of Xis Moli) = (1/4+(3/4) г)”, find E(X). Var(X) and Р(Х> 10). ) 
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c) Derive the m.g.f. of the binomial distribution. Use it to find the mean and variance of the 
binomial distribution. (P.U., B.AJB.Sc. 1992) 


Show that for the binomial distribution (g+p)", where р-1-4, 


dk 
ат —-, 21; and 
ra = Рӯ dp r an 


hence find out the first four cumulants. (P.U., M.Sc. (Stat.) 1969) 
a) What is a hypergeometric experiment and what are its properties? 
b)  Derive the hypergeometric probability distribution. 


а) Find the mean and variance of the hypergeometric distribution. 
(P.U., B.AJB.Sc, 1980, 82, 84, 91, 94) 


b) Determine the probability distribution for the number of white beads among 5 beads drawn 
at random from a bowl containing 4 white and 7 black beads. Use this to compute the mean 
and variance and check the results by using the formulas. : 


а) А committee of size 3 is selected from 4 men and 2 wo Ña the probability distribution 


for the number of men on the committee. ev 
5) А homeowner plants 6 bulbs selécted at rando $m a box containing 4 tulip bulbs and 4 
daffodil bulbs. What is the probability that he 2 daffodil bulbs and 4 tulip bulbs? 
Ten vegetable cans, all the same size, have Lost labels. It is known that 5 contain tomatoes and 
5 contain corn, If five are selected at ran ‘what is the probability that all contain tomatoes? 
What is the probability that 3 or more co omatoes? (B.Z.U., B.A./B.Sc. 1991) 
2) Determine the probability that me Tax Authorities will catch 3 income tax returns 


with illegitimate deducti: it randomly selects 6 returns from among 20 income tax 
returns of which 8 сопа тше deductions. 


To avoid detectio Rustoms, a traveller has placed six narcotic tablets in a bottle 
containing nine уй и?рїї< that are similar іп appearance. If the customs official selects 
3 of the tablets random for analysis, what is the probability that the traveller will be 
arrested for illegal possession of narcotics? 


Discuss the difference in conditions that must exist in a problem situation for application of 
the hypergeometric and the binomial distributions. 


Show that the hypergeometric distribution A(x; М, n, А) can be approximated by the binomial 
distribution for large N and К. (P.U., B.A./B.Sc. 1994) 


random sampling of 4 members of a 150 members club has shown that 3 prefer no smoking in 

clubhouse dining room. What is the probability that this. will occur if in fact only 20% of 

5 prefer no smoking in the dining room. Find this probability assuming that the sample was 
under 


sampling without replacement, and 
sampling with replacement. 
Compare the two answers. / 


336 
+632 а) 


с) 


8.33 а) 


b) 


с): 


8.34 а) 


b) 


8.35 a) 
b) 


8.36 а) 


b) 


8.37 а) 
n 
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Describe a Poisson distribution. 


Derive the Poisson distribution as the limiting form of the binomial distribution, 
clearly the assumptions you make, (P.U., В.А./В.ӛс. 1975, 


If X is a Poisson random variable with д = 1.6, find Р(Х-0), Р(Х=1), Р(Х=2) and PU 


d "ax 
А Poisson distribution is given by P(X = х) = =, Find the probabilities for x =й. 
x: 
3 and 4. (P.U., B.AJB.Sc. 1 
Suppose that X has a Poisson distributions if P(X—-1)-0.3 and i em 2, calculate 
and Р(Х-3). 
A random variable X has a Poisson distribution such that P(X-2) = 3P(X=4). Find 


and P(X <3) upto three places of decimals: (P.U., B.A/B.Se 


е 


Prove that, if n —> œ and р->0 such that np i, then 


езге 


| “ы (1 р)" > м 
where e = Lim (1 + 1/пу' = 2.7183 approxi қ (P.U., B.A./B.Sc. 


Past experience in the production о, rain component has shown that the p 
defectives is 0,03. Components le: е factory in boxes of 500. What is the prob: 


i) a box contains 3 or КЫ] ctives: 


ii) two successive кемде 6 or more defectives between them? 
NY (P.U., B.A./B.Sc. 
е 


Define the Poi т ution and derive its mean and variance. 

Ten percent id tools produced in a certain manufacturing process turn out to be 
Find the probability that in a sample of 10 tools chosen at random, exactly two 
defective by using (i) the binomial distribution and (ii) the Poisson approximati 
binomial distribution. (P.U., В.А./В,5с. 


Suppose that the number of insurance claims closely approximates a Poisson дї 
with и —0.05, Find the probability of (i) no claim and (ii) 1 or fewer claims. 


Assume that the probability of being killed in an accident іп a coal mine during = 


m Use the Poisson distribution to calculate the probability that in the mine z 


1400 
350 miners, there will be at least one fatal accident in a year. 


А secretary makes 2 errors per page on the average. What is the probability that оп 
page she makes (1) 4 or more errors? (ii) no error? 


venons m HR SIM Sai? 943.blogspot.com 


А car hire firm has 2 cars, which it hires out day by day. The number of demands for a car 
on each day is distributed as a Poisson distribution with paramete™1.5. Calculate the 
proportion of days on which neither car is used, and the proportion of days on which some 
demand is refused. [e*=0.2231]. (B.ZU., B.A./B.Sc. 1990) 


А manufacturer of cotter pins knows that 5 percent of his product is defective. If he self sotter pins 
m boxes of 100, and guarantees that not more than 4 pins will be defective, what is the approximate 
probability that a box will fail to meet the guaranteed quality? (67%-,0067) 

(B.Z.U., B.A./B.Sc. 1988) 


2)- Find the mean and the variance of the Poisson distribution. State the relation between 
binomial and Poisson distributions. (P.U., B.A./B.Sc. 1970) 


Š) Given that X has a Poisson distribution with variance 1, calculate Р(Х-2). 


- Criticise the following statement: 
"The mean of a Poisson distribution is 5 while its standard deviation is 4." 
In a Poisson distribution the first two frequencies were 250 60. Find the frequencies of 


the next two values of the variable. о 
Show that the mean and the variance of a Poisson distriggtion are equal. 
e (P.U., В.А /В.5с. 1986) 
son di 1 
Find the first four moments of the Poisson БУ tion р(х; и ), and hence prove that f, = 4 
ad -34L. - ор (P.U., B.A./B.Sc. 1983, 88) 
р 9 
that the discontinuous (роза акоп whose probabilities corresponding to the values 
Tss jo are E 5 
-à -4)) 
Sy ары и S eve 
e^,e 2 +, deed 
bu Л ( 
the second, third, fourth and fifth central momeñts given by 
=; =A, и, = A(1+3A), и; = A(1+10A). (P-U., М.А. Stats. 1969) 


that, if X; and X; are independent Poisson variables with parameters A, and Aa respectively, 
=X, +1} is a Poisson variable with parameters À, +A, . 


е difference of two Poisson variables follow Poisson distribution? 


оп distribution to "Student's" yeast cell data. 


| Frequency | 213 12:33) 748 ^ 34" 4 


Gaissoned 44-46 


8.46 
847 
(29: je 
\ qo % 
Р, 
а” 848 
S 
A 
WU 
- 849 
~ 8.50 
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b) Тһе frequency of accidents per shift in a factory is shown in the following table: 


Accidents per shift | 0 1 233 4 5 
Frequency 300 96 34 9 1 0 


Use the Poison distribution to estimate the probability of . 


i) no accidents jna shift її) more than one accident in a shift. 
(P.U., В.А./В 5с. ( 


A skilled typist, on routine work, kept a record of mistakes made per day during 300 


Misjakes per day 0 1] 2 3, wa 5 6 
No. of days 143 90 42 X 9 3 1 


Compute the frequencies of the Poisson distributiog Which has the same total 
as the above distributio. . /,- S 

! 4 

The number of road accidents notified rag police station per day is shown in 
frequency table relating to a period of NA Xuccessive days: 


Calculate the mean пиц 8 accidents a day. Use the Poisson distribution, with 

calculate the expected, i equencies. Assuming that this distribution continues to ag 

probability of 4 or "accidents being notified on any one day. 
XJ 


a3) Define Aon process. What are its properties or assumptions? 


b) Suppose that customers enter a certain shop at the rate of 30 persons an 
Poisson distribution, calculate the probability that in a 3-minutes interval, пе 
enter the shop. (P.U. 


a) Flaws in plywood occur at random with an average of one flaw рет 50 
the probability that a 4 feet x 8 feet sheet will have no flaws? At most one 

b) A doctor receives an average of 3 telephone calls from 9 p.m. until 9 a.m. the 
Assuming arrivals of calls are.a Poisson process, what is the probability 

not be disturbed by a call if she goes to bed at midnight and rises at 6 a.m.? 
(PU 


A computer system in a company has a breakdown once in 25 days, on the a 
breakdowns are a Poisson process, what is the probability of (i) exactly one bi 
10 days? (1i) more than one breakdown in the next 10 days? 
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The number of cars passing over a toll bridge during the time interval 10 to 11 aim. is 200. The 
cars pass individually and collettively at random. Find the probability that 


i) not more than 4 cars will pass during 1-minutes interval 10:45 to 10:46, 

ш) 5 or more cars will pass during the same interval. 

Find the m.f.g. about the mean for a Poisson distribution and use it to deduce the moment ratios p, 
end D. for the distribution. 

Use m.g.f. to prove that the sum of two independent Poisson variables is a Poisson variable, 


Suppose that the probability of an inséct laying п eggs is a Poisson distribution with mean т, 
and that the probability of an egg developing is.p, Assuming natural independence of the 
eggs, show that the probability of a'total of К survivors is given by the Poisson distribution 
with parameter mp. ' - (P.U., М.А. Stat4 1969) 


What is a negative binomial experiment and what are its properties? Derive the negative 
binomial distribution. . 

The probability that a swimmer will succeed іп swimmin ‘oss a lake is 0.4. What is the 
probability that the tenth swimmer is the fourth one to che lake? 

If X has a negative binomial distribution, then shown E(X)-kg/p and Var(X)=kq/p". 

if E(X)-10 and c 73, can X have a negative bj distribution? 

Find the probability that a person flipping Wh gets the third head op the seventh flip. 


In each of a succession of independe: 15; the probability of a certain event A is р: Trials 
are continued until the event 4 n observed exactly k times. If X be the number of 
trials, show that the distribution, is s 


© 
i-a rg х= k+ ly 


ҳу 
Find the expected and standard deviation of X. 


The probability that a person will install a black telephone in a residence is estimated to be 
0.3. Find the probability that the 10th phone installed in a new sub-division is the 5th black 


phone. 


Describe the negative binomial distribution and show that its variance is greater than its 
mean. 


Calculate the first four cumulants for the negative binomial distribution. 
(P.U., М.А. Stat. 1969) 


What is a geometric experiment and what are its properties? Derive the gcometric 
geobability distribution. 


Sow that the mean and variance of the geometric distribution are  — 1/p and o^ =q/p’, 


When flipping an unbiased coin, determine the probability that the first head occurs on the 
“Жегі vial. - 


340 
22859 


5.61 


8.62 


ч) 


b) 


а) 


5) 


NS 


ASS 


Mcd i hin gem na -green and 2 are blue. 
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What is a multinomial experiment and what are its properties? Derive the formula 
multinomial distribution. 

Find the probability of being dealt a bridge hang of 13 cards containing 5 spades. 2 
diamonds and 3 clubs. a 4 

The-painted light bulbs produced by a company are 50% red, 30% blue and 20% 
sample of 5 bulbs, find the probability that 2 are red, 1 is green and 2 are blue. 


A box contains 5 red, 3 white and 2 blue marbles. A sample of 6 marbles is 
replacement.-Find the probability that (1) 3 are red, 2 are white and 1 is blue, (ii) 2 
are white and 1 is blue; (iii) 2 of each colour appears. ү 


Derive the mean of hypergeometric distribution. 


The reception office at a building receives an average of 4.9 phone calls per 
the probabilities of reteiving exactly 6 phone calls at this office during 
i) half hour ii) an hour 


The painted light bulbs produced by a company | % red, 30% blue and 2 


9 

Derive the Poisson distribution as the шор form of the binomial ді 
clearly the assumptions you derive the moment geherating 
Poisson distribution. 3 
During a promotional campaigg Di a new drink, a soft drink company places 
caps on one of every ten Hoping to win a prize, a child decides to buy 
new cola each day for week. What is the probability that the child win 
i) atleast опе day? X ii) first two days? iii) all days? 

i Мы (Р.В 


ж %...».... 
v ^ 


——— жш шш oal 
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INTRODUCTION 


їп this chapter, we shall consider some important continuous probability distributions or density 
which are met in practice. Of all the continuous probability distributions, the normai 
ion is perhaps the most important distribution which is used extensively in solving problems Бо 
ility and in statistical inference. 


NIFORM DISTRIBUTION 


The density function of a continuous r.v. X is called a uniform distribution when between the enc 
шу two subintervals of the same length containing X, have the same probability. 


Alternatively, a rv. X is said to be fe 
» distributed if its density function is 


, asxsb 


elsewhere. 


3bution derives its name from the fact 
density is constant or uniform over the 
2s b) and is 0 elsewhere. 


called the rectangular distribution beca otal probability is confined to a rectangular 
h base equal to (b-a) and height 1Д5-а) parameters of this distribution are a and b. Since 
that.X with this distribution is ot riable, therefore we must have that 


=, b 
[roe - f El 
on arises in the stu cie off errors, etc. Its distribution function will be 
0, for x«a, 
F(z)-12—., for а &x&b, 
b-a 
1 for x>b, 


Properties of the Uniform Distribution. 


tet X have the uniform distribution over (а, b]. Then its mean is = and variance is 


B-a) 
B . 


EQ = proe pae 
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2 722 
кай ey the midpoint of the interval. 
20-а) 2 : 


And Var(X) = E(X) — [ECO], where 


үй жегі ud i Ге? 
Now EQ = (|2---ж- f? d= Б 
4 -а 


Ь Ь-а b-a| 3 
у а а 
b-a а? +аЬ+а? 
= ee 
3(b-a) 3 


Var(X) = E(X*) - [E]? 


3 4 12 
2. The shape of the distribution is rectangular. 


9.2.2 Moment Generating Function of the Uniform Distqfhtion. The m.g.f. is obtai 


S , 
M,0- Ee |= [ou әзі Je ж Be] 


= е PI 
6-а: ` ә 


9з EXPONENTIAL DISTRIBUTION 


A random variable X is said to have an exponential distribution with parameter 4. 
defined by 


o Дх) = Ae™ « for x > 0, 


=0, elsewhere, 
where A > 0. The p.d.f. may also be written as 


fo) =" for x > 0, 


-0, elsewhere, 
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function fix) is a proper p.d.f. since 
jo dx = Је" =], =1 
-© 0 


distribution function of the exponential r.v. X is 
by 


- Ee. for x > 0, 
=0, elsewhere, 


3 


P(X»x)- e^ 


93.1 Properties. Тһе following are some of the i rM properties of the exponential 
ion with parameter 4. ке 


1. Тһе mean and standard deviation of the DAET РА are equal. 


М 
Now EX) -и- quoa e 


-0 0 


To integrate it by parts, we use th Mula f udv = uv — | v du and make the substitution, 
dx = dv and x = u, so that у = —e Bad du = dx. Then we have 


joe dx- E qe dx 


BUX) or w=. 
Again Var(X) = Е(Х2)-(Е(ОР, where 


E(X*) = [erea 
0 
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2 
' Var(X) = о? -2-1) aL 


and hence с = 2 
А 
2. The distribution is extremely skewed and thus there does not exist any mode. 


9.3.[ Moment Generating Function of Exponential Distribution. The m.g.f. is obtai 
follows: 


Molt) = Ble“ = fe" 2e ^* dx = A fe ах 
0 


Oa 


SE pene" 
m aq RE Е fori <À 
A-t B 4-1 


Ехатріе 9.1 Тһе duration of long-distance telephone calls is found to be exponentially dis 
with a mean of 3 minutes. What is the probability that a call will 1851,8) more than 3 minutes, (ii) 
than 5 minutes? , 9 

Let X be the exponential r.y. with parameter 4. Then ape іе. = =3,s0 that 4 = i 


Now (1) the probability that à call will last E d minutes, is given by 


= T3 ge 
PUX >3)- (2) dx = [р » = 0.3679 


and (ii) the probability that a call will СЫ Шап 5 EE is given by 
Р(Х >5)= Gs me Сы e = =e] =0,1827 


9.4 GAMMA AND РЕ нні 


Тһе gamma and beta distributions derive their names from the well known gamma 
functions which are very important in many areas of probability theory and mathematics. 
before proceeding to these distributions, it is appropriate to review the gamma and beta functi 
some of their main properties. 


9.4.1 Gamma Function. The gamma function for any number n > 0, denote by Fi (л) ‚15 
by 


T(n)- ГЕ? Е 
2 


We сап give an alternative meaning to Г(л) by proving that Г(п+1) = n!. 
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Now by definition, T(n +1) = ре" 4х, п>0 
о 


To integrate the above integral by рагі, we use the formula Judy sw - v du and make the 


Е) -© 


ions u = x", dv =e™*dxso that du = пх"! and у =—е_*. Then we obtain 


" LJ 
Г(п+1)= ре + |ы ах 
0 


= пГ(п) by the definition of Г —function. 

T(n) = |x" "e™ dx 
! SS 
бе“ eT. + је- -De*x n-2 


-(n-DT(n-1) Ке 
n be a positive integer. Then repeating ће arae of Г(п) = (n - 1) Г(п -1), we get 


Г(п+1) =n. (n-1)T(n-1) 
7 n(n-1)(n-2) ...3 S (1) 
E 
n = 0, we find that р NO 


ГО) = fe*dr= z-e7 +1=1. 
Hence Г(п+1) = п(п ~, –2)... 3.2.1 = п! 
Writing x — ^ in Г(п) = pe е“ ах, we have 
0 
T(n)=2f е» у?" ду, 
0 
form of Г -function. 


n -1 in this form, we obtain 


ТЕРЕ: 
0 0 
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9.4.2 Beta Function. The beta function for any two positive numbers т, п, denoted Бу B(m, 
defined by ; 2 


1 
B(m, п) = т^ (1-3) dx, form>0,n>0. ` 
0 


š M 1 | 
When m=n=1,B(1,1)= [х°(1-х)°ах=1 
i Е 
Writing 1 — x; we find that Ж 
9 1 
Вт, п) = -fa -2)" 2" dz= [0-27 27 dz 
H 0 = 


= B(n, m) 
Hence the B-function is symmetrical about m and n. 
Now let xsin' 8 so that dx = 2 sin Ө cos 0 40. Then theSstitution gives 
i KS 


жі/2 Q 
B(m, n) -2 |Ба "0 cos?""! 9 46 % 
0 


B(m, n) dz , which is also B-function. 

Ж і - 

It is interesting to note that the gamma and beta functions are related according to the 
formula: 


T(m). T(n) 


а Г(т+л) 


, ќогт> 0, п> 0. 


-Putting m-n-7, we have 
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Hence (2)-4 or Je?» ^al dn 
0 


This is a very important result. 


9.4.3 Gamma Distribution. A continuous г.у. X is said to have a gamma: distribution with 
m > 0, if its p.d.f. is defined by 


=+{Г(т) 


1 m-l „—х 
х" e for 0$ x «o 
+ Ji) = 
0, elsewhere 


variable with parameter m is usually denoted by y (т). A straightforward integration shows 

f f(x) ах = 1 and hence it represents a p.d.f. сө 
è Е е6 2 ev 

The distribution function F(x) is % 

* 1 та = s 

.e* dx, 
iw ^ ^ 
Хх? 

-0, 9 х<0 


is also called the Incomplete батта tton and has been tabulated by Karl Pearson. 


9. 4.4 Properties of Gamma ution. The important properties of the gamma distribution 
as follows: ` е" 


1. Тһе шеап and ie gamma distribution are equal to its parameter m. 


1 
Г(т) 


x" e dx 


Now, и = Е(Х) = Тело fe 


> 544 к Г(т +1) тГ(т) 


Cir * cT OT 
and Уа) = EQ?) - [EX], where 
Š F aston, m- -x 
ЕФ?) f Пт)" .e* dx 
ы, Г(т+2) - x (m 1) (m)T (m) mti) 
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с? =m(m+1)-m =m. 


Hence mean and variance are each equal to m. 


2. The curve of f(x) is asymptotic to the X-axis. If m > 1, the curve has a mode at © 
If m > 2, it touches the X-axis at the origin 


fe) 


3. The curve becomes asymptotic to both axes when m вее Zero and опе. 


4. Reproductive Property. The sum of two independent Gamma distributions with 
and n is a Gamma distribution with parameter “ п). : 


9.4.5 Moment Generating Fünction of Gamg@ Distribution. The m.g.f. of X with 


origin is 
© 1 ху x 
Mit) = E(e*) = l^ ede = 
di ua 


Let u 7 x(1- t), ТЫ = E Then substituting these values, we get 


1 tu(_u_\"" du 
Mdi) = ер. е 
са гш) I 1-6 


1 


* je unt se" du 
@—)"`;Г(т) 


= (1—7, provided һағ|1|<1. 
On differentiating М4) ғ times with respect to and putting 1 = 0, we find. 


4, = m(m+1)...(m+r—1) 


Thus /4-т, и;=т(т+1) and hence 4h = ш =? =m, etc. 
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The cumulant generating function for this distribution with respect to origin is: 
K(t) = log, Mo(t) = —m log (1— 2) 


Thus the rth cumulant is given as 
К,-т(ғ- 1)! =т Г (r): 
9.4.6 Beta Distribution of the First Kind. А continuous г.у. X is said to have а beta distribution 
two parameters т and n, if its p.d.f. is defined by 


1 
f(x) = 4 B(m, n) 


0, elsewhere. 


x"(-x)", 0<х<І; mn»0 


This distribution is known.as a beta distribution of the first kind and s variable of the first kind 
efly referred to as By(m,n).Itisa proper probability density ogee the area under the curve 


жопе. ` S 
The distribution function F(x) is given by 


0, | ar 


F(x) = [rm PEU: vag for 0<х<1 
: g ! 
0; : e forx»1 


This is also called iS beta function, and it has been extensively tabulated. 
9.4.7 Properties of /,(т, л). The main properties of this distribution are given below: 


mn 
——— respectively. 


and —— ——— —— 
+п (mn) (т+п+1) 


1. The mean and variance of this distribution are 


They are computed as below: 
= . x" 1-х)". хах 
4 B(m,n) 
à 1 
x" (1-x)*! dx = Bim be) 
г В(т,п) B(m,n) 


© __тГ(т)Г(л) Г(т+п) т 
(т+п)Г(т +п)` Г(т) T() теп 


сс ^. https:// stat9943.blogspot Com. scamsricAL 


Var(X) = o^- E(X) - [EU], where 
EQ) = fe f G)dx 


1 

- IL x" (1-3) dx 

5 B(m, n) 

В(т+2, n) | т(т +1) 
B(m,n) 5 (т+п)(т+п+1) 


du mm +1) 1 m J piis af MR ation д 
~ (m+n) (т+п+1) (m+n (m+n)? (т+п+1) 


2. Higher Moments, The rth moment about the origin 0 is given by 


Хр: = 1 4 r т-1 т^" 
н, Е(Х) (т) 5 Р .x" (12 x)" dx s 
_ B(m+r,n)_T(m+r) Ken) 

В(т,п) Г(т) ДӨл+г+л) 


x m(m +1)... хам” . 
(m+n) (m+ ..(m+n+r—1) 
It should be noted that the m.gf аж. TS 


distribution does not have a simple forme) 
3. The shape of the beta файл for 


т=4, n=3 is indicated figure. З 1 

4. If m and п are Боду емет than 1, it 1.0 
ҳу т-1 і 

has a modal Aè ах. 0.5 


m = n = 1, it reduces to the uniform 
distribution over the unit interval. The 
curve of f(x) touches the X-axis at 
x70, when m? 2. 


. 9.4.8 Beta Distribution of Second Kind. A continuous г.у. X is said to have а beta dë 
the second kind with parameters m and n, if its p.d.f. is defined by 


1 х"! 


F(x) =} Вт, п) qx)" 
0, otherwise. 


for 0 € x««, m, n > 0, 
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beta variable of the second kind is generally denoted by £, (т, n). To check that the function 
ents a. proper probability distribution, we observe that 


x 
[roa г; is nues" 


l+x= 2s so that х= ісу апа ах- 222 Then substitution gives 
y y y 


yr! 


(1- у)" dy 21. 


Properties of 2. (m, n). The important properties of beta distribution of the second kind are 


Moments about the origin are easily calculated as follows: 


F 4 E ж 1 ym - Se 
50) Ља п) ‘Ga dx KS 


| утты (1-y)” dy, on жез р 
[P qe 


= n) 
ү B(m*r,n-r) | ғ ‚(т+т-1) Ж: 
B(m, n) ENT (n22)(n-r)  —' 
е 
Е m>1, the distribution i Noa with a mode at x= = 
n+ 


J-shaped. 
The curve of f(x) is asymptotic to the X-axis and it touches it at the origin if m>2. 


ї 


Ui n) 5 


li 


. If m = 1, the distribution ts 


The curve touches the Y-axis at the origin if 1 «m2, and the curve becomes asymptotic to both 
axes when т lies between 0 and 1. 


DISTRIBUTION 


xormal probability distribution, which is considered the cornerstone of the modern statistical 
discovered by Abraham de Moivre (1667-1754) as the limiting form of the binomial 
by increasing л, the number of trials, to a very large number for a fixed value of p. But his 
unnoticed until 1924 when it was, found іп-а library by Karl Pearson (1857-1936). The 

S. Laplace (1749—1827) is also associated with the derivation of the normal distribution. 
distribution is also called the Gaussian distribution in honour of the.great German 
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mathematician Carl Р. Gauss (1777-1855), who also derived its equation mathemati 
probability distribution of the errors of measurements. It was Karl Pearson who in 1893 
normal distribution and is best known by this name today. 


A normal distribution is defined by the p.d.f. 
1 =n) /20°} 
арэ И) {20 1 for -о<х<о,апіс>0 
оУ2л 


where 4 is the meah, с is the standard deviation and п (=3.1416) and e(=2.7183) are 
Obviously a normal distribution is characterized by two parameters и and c, its mean 


deviation. Since | f(x) dx =1 (see properties), the function f(x) is a proper probability d 
The normal distribution having mean u and variance с? is usually denoted by N( ш 6%. 
Ми.,63) means that а r.v. X is normally distributed with mean p and variance с ? The 
normal distribution, which is symmetrical bell-shaped curve, is called the normal curve. The 
shape of the normal curve are determined by и and с. To puPuifferently, p changes they 


the normal curve along horizontal axis while o determings OF horizontal spread. A sketch 
values of и and с is given below: N) Г 
2 


fe i 
52, ы 


А - 
Тһе distribution faction of the normal probability distribution is given by 


x 1 
F(x) - Р(Х «x)- E Jee ота, 


which is sketched below: 


fe 


This curve is the ogive of 
the normal curve. -0 H-O и H+O HO 
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9.5.1 Standardized Normal Distribution. A normal probability distribution depends оп the 
of the parameters p and с ? and the various possible values for these two parameters will result in 


ited number of different normal distributions. The r.v. Z = EE , as we have seen, has zero 
. с 


and unit variance. Every normally distributed r.v. X with mean = р and variance = с ° is therefore 
iently transformed into a new normal г.у. Z with zero mean and unit variance by using the 
mg expression 


Then the p.d.f. of Z, denoted by ф(2) ( 9 is pronounced phi) becomes 


for -с<2<>. 


1 213 
2)----е "7, 
а-а 


following figures illustrate the transformation of the original отш! distribution into the 
normal distribution. 


Standard Normal 
Distribution 


probability distribution of Z which has zero mean and unit variance, is called the 
normal distribution or unit normal distribution and is denoted by N(0, 1). The distribution 
standard normal distribution, usually denoted by Ф (z) (Ф is capital 4 ) is 


Ф(г)= IZ «z)- 


2 
е тш, ^ 


tabulated for positive values of 2. (Table 9.1 contains these values). The values of Ф (2) 
values of z are obtained form the identify Ф(—2) = 1 — Ф (z). It should be noted that 


3 and for any a and b (positive or negative) P(a<Z<b) = Ф (b) - Ф (а). 


354 
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Table 9.1. Cumulative Standard Normal Probabilities 


0 L1! 0 оь ojn 


ф (2) 
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9.5.2 Properties of Normal Distribution. The main properties of the normal distribution are 
below: 


1. The function f(x) defining the normal distribution is a proper p.d.f., ie. /(х) 20 and the total 
area under the normal curve is unity. 


Proof. Cleary f(x) is always non-negative, and the total probability (area) is 


roe = [Lees ad: 
=a) 2927 


Let z=" Then с dz = dx. Substituting these values, we get 
1 -г/з 
Area = ero dz 
с\2л 
2л - LU A S 
4 > 
The function fer? dz being an even function of 2 be by letting w = -z, written as 


EE % 
dz = |е" dw. Then 


l > 
Ага = 2 еа 


Let у= lg ,50 that уі Тһеп 
2 КА, 5 


ities ae 


the total area (probability) under the normal curve is unity and hence the function flx) defines 
pdf. 


1 The mean and variance of the normal distribution are и and с> respectively. 


f. By definition, н = E(X) = IE f(x) dx 


һїїр<://5їа1ї9943.һ!оо®Б®$ЄБЕ Ото srrisricát: 


gom 1292) gy 


"dE 


== Thenx-y +z с anddx = с dz. 
с 
Limits: 
whenx = 0,270; 
when x= – о,2= – о } 
Therefore Е(Х)- : fu + zo)e 7" dz 
зт 22 
H 7 =27/2 c 1 -ї2 
= е dz + ze dz 
5 J2x 1 Vin 1 
The first integral represents и times the area under а curve with zero meam 
variance and hence is equal to и. The second integral being Зу function, equals zero. Thus 


ЕХ) = =д,їе. 4 lucas 


And — Var(X) "= Е(Х-н):- fe- -u)° Ka 


"El ко 
xcd y deii орны Хи 4 
Ут, ; ^ LAE 


© © 
To integrate by parts we use the formula [иф = wv -. [vau and make the 


-© 


dv= ze??? dz and u = z so that v=—e7*? and du = dz. Then 


2 qe 
c 2/2 с 2; 
Var (X) = ae ар, е d=0+0 =с? 
ы Vox Ра ! 


-= 


Hence E(X) = и and Var(X) =o" . 
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3. The median and the mode of the normal distribution are each equal to jt, the mean of the 
distribution. : 


Now, the median, a, is given by fr (x) ах -1 - Therefore 


а 


few uy 223] х1 
2 


ол 


-о 
а-и 


c 
Ale de =, on putting zon. 


Sut we know from the symmetry of the standard normal distribution that 
9 ж 

1 43/3 1 іп 1 
—— le^ "422 e*'"dzz— 

эл 1 7) 2 Ў 

қ” 
700ra- u, ie. u is the median of the бно 
= 


a-u 
in the mode, if any, is that value of x for which T and f'(x) « 0. 


Es cud а 2 eh 
арга E o% 


с 


ҳу 
a. 1 E -(х-й)?/ 2а? қ» 
eu ДХ, 


Мх) = 0, we see that x= «ғ 
iating, we obtain b 


2 


—L [очаи от) _ e-u fae? (х -д)? 
о?/2л c 


vitis _& =“)? | 


o 42z c? 


т-шіп f'(x) we see that /"(х) «0. Thus x = u is the mode of the normal distribution. 


and mode are both equal to и. Since mean = median = mode, the normal distribution is 
unimodal, | 


deviation of the normal distribution is approximately i of its standard deviation. 
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Proof. Now the mean deviation of the hormal random variable X from the mean, и , is gi 


7 1 hn i Iain? 
M.D. = E| X- ul» 4 x- б-а) [267] dy 
sis d b "S 


= ае? az, where z= 


—ze 5 dz + (же? i} 
0 


pe Ф-осЧ2/л Ferr] э 
л $ ; 


= o42/z =0.7979с = то š — 7 


5. The normal curve has points of inflection whjeiáre equidistant from the mean. 
Proof: The point of inflexion by which an a point at which the concavity 
obtained by solving the equation /"(х) = 0. > 
e 
Differentiating the function f(x) сы уян 207) we get 
S V27 
% 


/'(х) = ед D etae = 


Equating f'(x) «Ф we see that х= ш. We also observe that /'(х) > 0 for 
1 


x» A, f'(x) «0. EC maximum of the function f(x) is at х= и and its value is 


To find the points of inflection, we take the second derivative. Thus 


Г(д----- zl EE (x- Ec» ы 


Sie 
m 7i ett? fae? D (х= uy ] 
EX c 


Equating f(x) to zero, we are left with the following equation 


TC 
- [SE 
3 
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ich gives (x ys)? =o? 


or x-u-tg or x=u4+0 O 4-0. 


these two points, the values of the function f(x) is 


1 
oV2ne , 


1 
Hence the two points of inflection of normal curve аге [ е | апа 
сү2ле 


+0, F | . In other words, the points-of inflection occur on the right and on the left of the mean 
сү2ле 
distance equal to standard deviation and thus the graph of ће normal curve is bell-shaped. 


6. For the normal distribution, the odd order moments about the mean are all zero and the even 
order moments are given by S 


e 
Han = (2n – 1) (2n – 3)... 5.3.1 с”. ev 
Proof. The odd order moments about the mean are ЫЫ 
Шан = E(X- и j^ ЗА NS 
4 iy 
= је)" кы ge dx 
а К; 
' T tere odes TEA 
c 


LI 
ғ 8 
ә = 
ES 
R^ 
E N 
m ^" 
® 


order moments about the mean are obtained as below: 


Hay = BO 


4 2п 1 ғ-ау/ж? 
х-и) .—e ах 
[к=к же еее 


[co te? otz , on putting z= x 
‘AK ov 2л 
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2n ^c 
С P 
[е has, 


ed 
Елді 


2 
Let y= TT en dy =z dz, and therefore 


2n 93 
с = 
= 24/2 n-M2 е y } 
| а Vin b 


2 © 
Moro pem €. dy 
| ғ 0 


227.01 T(n*V2) KS 


М Ку 


ана 


= (n - 1) Qn -3).. S 


Putting n = 1 and 2, we get и, = оў д =30*. 
ҳу 


4 
Hence Д, = 0 meaning that шендісі zero, and 5, = ы. = = =3, i.e. the normal с 
uc 


kurtosis. This is, in fact, the qe of the choice of the apparently arbitrary value 3 in the 
platy-, and lepto-kurtosis. Qv 


7. IfXis №, с?) and if Y-a + bX, then Y is Ма + by, b^o?). 


Proof. Finding expectation and variance of Y, we get 
E(Y) = E(a + bX) = a + bu, (^ a and b are constants) 


and Уақу) = E(Y) -(E(Y)f = b o? 


Now the function X = a + bX may be either a decreasing or increasing function 
negative or positive. If Y is normally distributed, then its p.d.f. is given by 


б E: dx 
| : m= rola] 


4j 207 1 


h(y) = 
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g Usb 2220? 


1 
рр 
ich represents the p.d.f of a r.v. that is N(a+by, 8702) 

It follows that, if Xis М(д, 0?) and if Z UE Zis N(0, 1). 

с 
8. The sum of independent normal variables іѕ.а normal variable. Stated differently, if X, is 
М(д\, o.) and X, is W(t; с), then for independent X; and Ж, X, + X, is 
М +, о +097). 


. No matter what the values of 1 апа с are, areas under normal curve remain in certain fixed 
proportions within a specified number of standard deviations on either side of u . For example, 
the interval. 


i) utc will always contain 68.26%, $e 
i) p+2c will always contain 95.44%, eS 
Ke 
iii) +30 will always contain 99.73%. М 
Practically all of the area is between p —36 and > the range of the distribution is therefore 
imately 6 standard deviations (theoretically goes from -о to +), and we usually 
the graph at these points. This is a very i t property of the'normal distribution as most of 


of significance for large samples are b it. 
10. The Quartile Deviation, Q, is founds 


1 eo ANE, Ll 
0422 „0 е 2 
SS 
ds 8 
ог Ге: dz 2—, where z = ——— 
2л (еј 


we find, from area table, that 9 = 0.6745 ог О = 0.67456 which is also called the Probable 
Error, a term not used те ай, 

‘This also gives the values of the quartiles which аге: 

О, =u -0.67450 and О, = u «0.67456 . 


11. The normal curve approaches, but never really touches, the horizontal axis on either side of the : 
mean towards plus and minus infinity, that is the curve is asymptotic to the horizontal axis as 
х Э to. 


iple 9.2 For-a certain normal distribution, the first moment about 10 is 40 and the fourth 
about 50 is 48. Find its mean and standard deviation. 
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For a normal distribution, the first moment about an arbitrary mean a is given by 


"Ts [eee 
ш- [i-os me 109-66 сарай 
Now 40 = Jox-10) ло) de = fren ax-t0 [rc ae 


= р-10, where р denotes mean. 
н =40+10=50 
The fourth moment about 50 (which is mean) implies that 1,48. But for a normal 
fourth moment about the mean is 30. Therefore, we have 
304 = 48, which gives o =2 
Hence mean = 50 and standard deviation = 2. E 5% 
9.5.3 Moment Generating and Cumulant aS unctions of the Normal 


Let the r.v. X be N(u, o7). Then the m.g.f. of X with re: o origin is given by 


Mo(t)= кет "m [e e 


Let 
7. Lp LE 
xa mn 
су2, 
С 
шо-47/2 
Su. 


LONE 202202 
е —(1!-2ma-t0)- rg) 
= Лл е2 dz 
27 -0 
шөре? 1 | euo? 
р Ут 


=% 


The m.g.f. with respect to mean, и , is 
M,(t) = Де] = e^ EG) 


https://stat9943.blogspot.com 


OUS PROBABILITY DISTRIBUTIONS 


теме 2 „ше? 
ee) ee], en) 
=1+ + ж----- 
1! 2 n! 
Hons = 0; and 
ж 2” 
Иә, = coefficient of (2л)! 
3 Ea (2n)! c (2n)! 
2 oro ERI 
125 Оп Туа”; for alln > 0. 


The cumulant generating function is givenby · 


S 


(t) «log, М) = ut? 
Thus к, = д, к, = ос? and kK,=0 forr 2 3. S 


all the cumulants after the second are equal to zero. $ 
o 


є 


Example 9.3 If the p.d.f. of the ку. X is c 
ғо) = 1 есені 9 `-а<х<® 


322 қ 


mean, variance and moment р щй function: 


The function f(x) may be writt М 


го = { Ке ; 
4 ху D 


363 


Comparing with the general form of the normal distribution, we find that р = —7 and с?= 16. 


Xis М(—7, 16). 
: 1:42 
ЖҮ he UT - ша 
Substituting the values of mean and variance.in the relationmgf=e 2 
Lio 
-71%-(16)/ 
mgf-e 2 төлене 


Example 9.4 Let X = М(0, 1) mean that X has a normal'distribution with zero mean and unit 


What will be the distribution of 2X -34x +5 and 4X? 


Given that E(X) = 0 and Var(X) = 1. 
Let Y=2X- 3. Then £(Y) = E(2X - 3) = 2E(X) - 3 =-3, and 
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Var(Y) = Var(2X - 3) = 2? Var(X) = 4(1) =4. 
Hence the distribution of 2X — 3 is N(-3, 4). . 


Similarly, we find that ix +5 is М5) and 4X is N(0, 16). 


Example 9.5 1f X is М0, 1), then find the distribution of S 


The distribution of X is 
d M 255274 -о<х<о 
2л 


Lety =x so that х-үу and @х=——4у. Then the distribution of Y is 


DU 


dF = 1 эз dy 0€ y «o 


У 


л 
mye -yi2 „у-и 
“=: Vin e ‚Уу > ау y S 
1 


% 


mi dm, where m= Na 0sm«o. 


A 
97 


This is a Gamma variable with parameter N 


е" 


Г(1/2) 


e 
9.5.4 Tabulated Area of Normal ion. The area under the normal curve between 
ordinates at Х=а and X=b equals the probes that ће r.v. X lies in the interval (а, b]. That i: 


h 
Plas X <b)= у: dx. 
а e. 


which is represented by the er the shaded region. (see figure) 


But integrals of this Noe cannot be solved by ordinary means. They ure. however, eva 
methods of numerical integration, and numerical approximations for some function have been 
for quick reference. ; ‹ $ 


Table 9.2 on page (365) gives the areas (probabilities) for 

. the standard normal distribution from the mean, z= 0 to a specified 
positive value of z, say Zo. Since normal curves are symmetrical, 

therefore Р(0 to z) = P(0 to -z). That is why the areas for negative 

, values of z are not tabulated. It is important to note that this single 
table for the standard normal distribution suffices for the 

calculation of probabilities for any normal distribution. Hence, to 

use the table of areas for the standard normal distribution, the 

values of the r.v. X in any problem are changed to the values of the 

standard normal variable Z and the desired probabilities are 


ined from Table 9.2. Thus, to find Р(а<Х<Ь), we would change X into 2 as follows: 
Р(а<Х<Ь) = dr ым), 
с 


с с 
в) 
с с 


8 are the z-values of the standard normal variable 2. In practice, a normal curve 


E е 
CU NEC 
h for the given problem, showing under the X-scale, a scale for the corresponding values of z will 


in solving the problem. 
9.2 Areas under the Unit Normal Curve 


маю ва геш мю е ме о AE: 
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Example 9.6 Let the r.v. Z have the standard normal distribution. Find 
i) Р(0<2<1.20), 
ii) P(-1.65 SZ S0), 
iii) Р(0.6<2<1.67), 
іу) Р(-1.30<2<2.18), 
у) Р(-1.96<2<-0.84), 
уі) Р(2> 1.96), and + 
үй) Р(2<-2.15). 
First we draw the normal curve sketch, shading the desired area (probability) for each part. 


i) To find Р(0<2<120), in Table 9.2 page 
(365) we move downward the column marked 
Z until 1.2 is reached, and then move across 
fhat row to the column headed 0.00 to find 
entry 0.3849, Therefore Р(0<2<1.20) = 
0.3849. 


ii) Since the normal curve js symmetrical about 
the mean, therefore area between z = 0 and 
positive value of z is equal to the area Бем S) 
т = 0 and a negative value of z of AR 
magnitude, : bs) 
Hence, using Table 9,2 page зы һауе 
Р(-1.655<2<0)-Р(0<2< 1.65)=0.450©) 
iii) . Р(0.06< 2<1.67) д? 
=р0< 7< 1.67) + PEQ Z<0.6) 
= 0.4525 – 0.2250 
= 0.2268. (From rea tables) 


iv) Р(-1.30< 2<2.18) 
=P(-1.30S ZS0) - P(0S 2<2,18) 
= 0.4032 + 0.4854 
7 0.8886 (From area tables) 


v) P(-1.96S ZS-0.84) 
= P(-1.96S 2<0)-Р(-0.84<2<0) 
= 0.4750 — 0.2995 
= 0.1755 (From area tables) 
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Р(22 1.96) = 0.5 - (OS ZS1.96) 
= 0.5 — 0.4750 
= 0.0250 


Р(25-215)-0.5-Р(-2.15<2<0) 
= 0.5 – 0.4842 
= 0.0158 


-2.15 

Example 9.7 А random variable X is normally distributed with КЫ and 67-25. Find the 
(a) that it will fall between (i) 0 and 40, (ii) 55. and 100; (b) that it will be (i) large than 54, 
than 57, 


Же draw the normal curve sketch showing x and z values, and the desired area for each part. With 
с =5, we have 


0-50 SU 
- (i) At x = 0, we computer 2 = SN 
at x=40, we find z= 29-20. 20, қый 
Table 9.2 page (365), we һауе S* 
Р(0<Х<40) К | 
= Р(-10<27<-2) ANS | 4 | = 
кане таш. ното $ 
= 0.5 — 0.477 228 -10 -2.0 0 


ii) We have forx = 55, 


2 435739 edie: 
5 
Ғогх- 100, z= cM -10.0 , 
and the corresponding z values are 100 X 


figure. 0 10 10 


zsing Table 9.2, we have 
100) - Р(1.0<2<10.0) 


-Р(%<2<100)-Р(0<2<1.0) 
= 0.5 — 0.3413 = 0.1587 
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b) i) With p=50 and c 5, we have 
for x=54, z= ыш = 0.80. 

Hence using Table 9.2, we һауе 
P(X254) = P(Z2 0.8) 

= 0.5 - Р(0< Z< 0.8) 

‚ 20.5 – 0.2881 = 0.2119. x 
57-50 

i) Аїх=57, z= = 1.40. 
Therefore using Table 9.2, we have 
P(X<57) = P(Z«1.40) 

= 0.5 + Р(0< Zs 1.40) 

= 0.5 + 0.4192 

= 0.9192. қ 


Example 9.8 The length of life for an automati hwasher is approximately normally 
with a mean of 3.5 years and a standard deviation cars. If this type of dishwasher is gv 
12 months, what fraction of the sales will requixq cement? 


The fraction of sales 


3 g 
| replacement is equal to the area under ч) 1 
curve for Х< 1 year, the guaranteed Б 


With p 73.5 and с =1.0, we ет е transform 


х=1.0 year to the corresponde? values. Thus, 
1.0-3.5 se 


1.0 
ч ҳу 
Тһе х уаше andthe corresponding z values are indicated in the figure. From Table 9.2, 
P(X 1.0) = Р(2<-2.5) 


= 0.5 – P(-2.5< Z< 0) 
= 0.5 — 0.4938 = 0.0062. 
Hence 0.62% of sales need replacement before 12 months. 


Example 9.9 The mean height of soldiers is 68.22 inches with a variance of 10.8 (іп. 
the distribution of heights to be normal, how many soldiers in a regiment of 1000 would you 
over 6 feet tall? 


we find z= 


With р = 68.22 and с? =10.8 (in.)*, we first compute the z value. 


Atx=72 (6 ft.), we have 


бесе 


410.8 3.29 
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The x values and the corresponding z 
are shown in normal curve sketch, and we 
the right tail area which is shaded. 


fore using Table 9.2 page (365), we find 
Р(Х>72) =P(Z2 1.15) = 0.5 - Р(0<2<1.15) 
= 0.5 — 0.3749 = 0.1251 


If there are 1,000 soldiers in the regiment, then number expected to be over 6 feet (or 72 inches) is 
x 0.1251 = 125. 7 


Example 9.10 -If the moment generating function of X is M(t) = e102" find 
170<Х<200), (ii) P(148 < X « 172). : 


Comparing M (t) = еее With the m.g.f. of N(p, o°), we find p = 166 and o° = 400. 


Ё im X –166 
To find the desired probabilities, we transform x values to z values 60% z- SIE Therefore 
eod 


At x = 170, we get 


Е 1701966 xa 
2 EET = shy SS S 
z x = 200 we find New > 
200—166 9) UOZ 71.7 
E — 317. S 
20 ҳу 


қ. 
using Table 9.2 page (365), we fi 
P(170<X<200) = Р(0.2<7< g= P(0<Z<1.7) — P(0<Z<0.2) 
= 0.455% 090793 = 0.3761 
figure illustrates the problem. 
х = 148, we compute 
_ 148-166 
20 
x= 172, we find 
_ 172-166 
20, 
mg Table 9.2 page (365), we get 
#148 <X< 170) = Р(-0.9<2<0.3) 


=-0.9, and 


=+0.3. 


= P(-0.9<Z<0) + P(0x Zx0.3) 
= 0.3159 + 0.1179 = 0 4338 
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9.5.5 Inverse Use of Table of Areas under Normal Curve. When we use the Table 

: under the normal curve to find the values of z corresponding to a given probability in the bod 

tables it is called the inverse use of the area Table. The yalue of z given that Р(0<7<2) =й 

denoted by the symbol (z | P = К). If the given probability does not appear in the table, the 

probability may be taken to find the corresponding value of z. Тһе 2 value thus obtained is then 2 
to an x value, using the formula 


х=й 
c 


2 = 
which may be writtenasx = р +20. 
The value of z is: positive when x lies to the right of р and it is negative when x lies to the 


н. А sketch of the standard normal curve showing the given probability and the location of 
correct side of u helps in solving the given problem. 


Example 9.11 In a normal distribution u = 40 and Р(25<Х< 55) = 0.8662. Find P(20 € X: 

Given ц 740 and P(25 < X< 55) = 0.8662, where X is a normal random variable. 

We find that 25 and 55 lie on either side of the mean р = 4 distance, so that 
P(between X=25 and р =40) = P( и —40 and Х-55) ms 

Therefore, using Table 9.2 рач we find (2 | К 4331) = 1.50 

Substitution in the relation z — ae 


AS 
55-40 


1.50- which yiel * 10 
o 
Now P20<X<60) = к Ne 9 E Р(—2<7.<2) 


Bes 0)* Р(0<7<2) 
01772 + 0.4772 = 0.9544 
Example 9.12 Thage required by a nurse to inject a shot of penicillin has been obse 
normally ‘distributed, with a mean of и —30 seconds and a standard deviation of с = 10 secon 
following percentiles: (i) 10°, (ii) 90^. 
, i) Let x bet the point Po, where P, is a point at or 
below which 10% of the area lies. Then the area 
to the left of x is 0.1 and area between p and x is 
0.5-0.1-0.4. - 
Looking at Table 9.2 page (365) we find 
, that a probability of 0.4 does not appear іп the 


" table, so we take the closest ptotebibty to 0.4, 
which is 0.3997, 


Thus, using Table 9.2 inversely we fjnd (z/P = 0.3997) = 1.28 
Since x lies to the left of и, therefore 2 is negative at this point. 


Н gives 


2 И-30. 
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Hence x= р +zo = 30 + (-1.28) (10) = 17.2 seconds. 


8) Let x be the point Ps», where Poo is a point at ог 
below which 9096 of the area lies. Then the area 


to the left of x is 0.9 and the area between u and x 27 L x 
is 0.9 - 0.5 2 0.4. — A27 


From Table 9.2 page (365), we find (z/P = 0.4) = 1.28. Since x lies to the right of и, therefore z is 
at this point. 

x= p+zo = 30 + (1.28) (10) —42.8 seconds. 

Example 9.13 In a normal distribution with ц =13.5 and c =3.6, find two points such that a single 
ion has 95% chance for falling between them. 


Let x, and x; be the two points between which the 
ity of an observation falling is 0.95. As the curve is 
ical, so half of 0.95, i.e. 0.475 is the area lying on 
side of и. 


Using Table 9.2 inversely, we find 
(z/P = 0.475) = 1.96 
The point x, is to the left of р, s02 7-196 and at бз о the right of и, z = 1.96. Therefore 
= u +z0 = 13.5 +(—1.96) (3.6) oo 
= u +z0 = 13.5 + (1.96) (3. 


` 


Example 9.14 An athlete finds that Fon jump he can clear a height of 1.68m once in five 
and a height of 1.52m nine time\ut of ten attempts. Assuming the heights he can clear in 
jumps form a normal distribution, timate the mean and standard deviation of the distribution. 


Le X denote the height the SM can clear in various jumps. Then X is N(u, c?) where р and 
exknown. 
there is a probability of i 0.2, if he can clear a height of 1.68m, ie. Р(Х>1.68)-0.2 and 


в probability of ы = 0.9, if he can clear a height of 1.52m, that is Р(Х> 1.52)=0.9, implying that 
2)-0.1. 


probability (area) between p and x = 1.68 is 
= 0.3, and between p and x = 1.52 is 0.5-0.1 = 0.4. 
Table 9.2 page (365) inversely we find that 
) 7 0.84. 


x lies to the right of p , therefore z is positive at this point. 


-1.28 0 0.84 


(z/P = 0.4) = 1.28, since x lies to the left of u , so z is negative at this point. 
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Substitution in the relation x = р + 20, gives 


н + 0.84с = 1.68 
p - 1.286 = 1.52 


Subtracting, we get 2.126 = 0.16 ог с = 0.075 
Putting с = 0.075 in р + 0.846 = 1.68, we obtain 


u + 0.84 (0.075) = 1.68 
ог и = 1.68 — 0.063 = 1.617 


Hence the estimated value of u is 1.617 and of o is 0.075. 


Example 9.15 A collection of human skulls is divided into three classes А, В and C аса 
the value of a "length-breadth index" X. Skulls with Х<75 are classified as A(long-headed). 
75<Х<80 ав B(medium) and those with X780 as С (short-headed). The percentages in the three < 
this collection are 58, 38 and 4. Find approximately the mean and the standard deviation of © 
assumption that X is normally distributed. 


Let р and с be the mean and the standard eS 
deviation of the normal distribution respectively. Then the ev 
area of skull A, whose length-breadth index is under 75 į; 

0.58, беа баш B wwe ides hes ten 7S 
80 is 0.38, and the area under skull C, whose cod 

80 is 0.04. The accompanying sketch of the curve 
shows the given information. ”. 


Тһе area between и and X=75 gs 50=0.08, while the area between | н and 
0.08+0.38=0.46. In Table 9.2. we rea these probabilities do not appear there, so we take 
probabilities to 0.08 and 0.46 Ma pe 0.0793 and 0.4599. Therefore 


(z/P = 0.079. 50.20 and (z/P = 0.4599) = 1.75 
These z values are pgSifive as the x values lie to the right of ц. 
Since x= р * zo , therefore we get i 
“и * 0.206 = 75, and 
u +1.75с = 80. 
Solving these two equations simultaneously, we obtain 
и = 74.4 and с = 323. 


Example 9.16 A lawyer commutes daily from his suburban home to his midtown off 
average, the. trip one way takes 24 minutes, with a standard deviation of 3.8 minutes. 
distribution of trip times to be normally distributed. 


a)  Whatis the probability that a trip will take at least i hour? 
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If the office opens at 9:00 AM and he leaves his house at 8:45 AM daily, what percentage of 
the time is he late for work? 


If he leaves the house at 8:35 AM and coffee is served at the office from 8:50 AM until 
9:00AM, what is the probability that he misses coffee? 


Find the length of time above which we find the slowest 15% of the trips. 


Find the probability that 2 of the next 3 trips will take at least i hour. 

(P.U., M.Sc. 1989; I.U., M.Sc. 1986, 93) 
the r.v. X deriote the trip time in minutes. Then X is N[24, (3.8)2]. 

We need to calculate the probability that a trip will take at least V; hour, i.e. P(X 2 30). 


Р(Х >30)= QC 24 ЗЕ ale P(Z 21.58) = 0.0571 


He leaves home at 8:45 AM and the office opens at 9:00 AM implies that he has 15 minutes to 
reach the office. He will be late for work if he takes more than 4S minutes. Thus we need 
P(X>15). KS 


pacis AA, xn. P(Z> r3 


=0.54P(0<Z<2.37)= 0. Vie 


е percentage of the time he will be late Гог кы 99.11%. 


Be leaves home at 8:35AM and coffee į ed from 8:50 AM until 9:00 AM. He will miss 
coffee if he reaches office after 9:00, , ie. if he takes 25 minutes or more time. Thus we 
жесі Р(Х > 25). e 


x 
Р(Х >25)= жой нез. P(Z 20.26) 


=0.5— Р(0 < Z < 0.26) = 0.5- 0.1026 = 0.3974 


calculate the length of time (x) above which we find the slowest 15% of the trips, we need 
calculate the value of x such that P(X 2 x) = 0.15. 


Р(Х > x) 20.15 or (I 2 5224) ы 


3.8 3.8 


ГА = 
P 2>2 24 олз 
U^ 4 X8 


from the Tables of Standard Normal distribution that the value of z a to a tail 
1.04. 


x= р 426 = 24 + (1.04) (3.8) = 27.952 minutes 
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е) Using the binomial distribution with n=3 and p=0.0571 (where p is the 
` Will take at least 4 hour), we find that the probability that 2 out of the next 3 i 


least % hour is E ) (0.0571): (0.9429) = - 0.0092. 
9.5.6 Normal Approximation to the Binomial Distribution. The binomial di 
can be closely approximated by the normal distribution when n is sufficiently large and 


close to zero. As a rule of thumb, the normal distribution provides a reasonable 
binomial distribution if both np and nq are equal to or greater than 5. 


The probability for a binomial random variable X to take the value x is 
re (y: q"*, fot 0Sx<nandq+p=h 


The variable X has a mean of и =E(X)=np and Var(X)-npq. 
To derive the limiting form, we define a new random variable Z by the relation 


X пр 
Z= 


Néw, obviously E(Z) = 0 and Var(Z) = 1. Ke 
Thus we have to show that Z is normally di in the limit as п — o» and pis 
actually the de Moivre-Laplace Фонон): © the moment generating function {а 
the theorem. 
The mgf of Z is ; Ў 
ЕШ -Е(е?)- Кат] 
оі etim] 


ЕСІГІ (a+ pe’ 


wr 


4 Сады $ ре" lom ) 


"е 


[| 


contmuous rron MARS Hst ke .01095рої.сот 


2502 3 4 
Agi РИА enl per на 
| Japa 2 ара) npa) — A npo 
1 2 1 3 1 4 
gt gt gt qt | | 
*plt——- +—|—==|+—|—=—=|-.. 
npq He £] Е) iUt 


2 WR еи 42 li 
T uod n qp. а-ро 3 


21.3! n4npq A п? ар 


logs to the base e, we have 


БОЕ a of 1 
EI gor E эм. 


log, M(t) = "+ SW Ss um a 


„Гар. 1 1-6рд | SS 
ESSI npa 4! npq e ? 


2 
lim log; M(t) = = or lim M(t) =e"? 


S 
We know that e°’? is the m.g.f ofa standafy normal variable. 
Жетсе we find that in the limit 2 has a а normal distribution. 


Ж = important to note that a bi Variable is discretezwhereas the normal curve probability is a 
y for an interval. Therefore iħùsing normal curve areas to approximate binomial probabilities, а 
salue of the binomial varjaffe is to be replaced by an interval before the 2 values are computed. 
‚ a discrete value mes the interval from х—0.5 to x+0.5; and this sort of adjustment is 
continuity correction) Thus, the discrete value 5, adjusted means 4.5 to 5.5. 


le 9.17 A fair coin is tossed 20 times, Find the probability that the number of heads 
is between 10 and 14 inclusive by using (a) the binomial distribution, (b) the normal 
to the binomial distribution. 


Let X denote the number of heads occurring. Then the p.d. of X is 


мен PIG] 


desired probability is P(10 < X« 14). 


Mew Р(10<Х<14)= Jes cj 


x xA2 


NE https://stat9943 blogspatcsonararisiicaL T 
Quare {и (5 (s 


\10 2 n (2/2 А2) 
= 0.1762 + 0.1602 + 0.1201 + 0.0739 + 0.0370 
= 0.5674 


Alternatively 


14 
P(10s X <14)- У 5(x;20,0.5) 
х=]0 


14 9 
= Уа; 20, 0.5)- У b(x; 20, 0.5) 


x=0 х-0 


= 0.9793 - 0.4119 (from binomial tables) 


= 0.5674 
b) Since np = 20(0.5) = 10 > 5 and nq = 20(0.5) = 10 > 5, so ill use the normal d: 
to approximate the binomial distribution. © г 
Now р =лр = 20 (0.5) = 10, and QU s 


o = Jnpq = /20(0.5) (0.5) = 2.24. ee 


М 
For the normal approximation, the inert discrete value 10< < 14 is replaced 
interval 9.5 < X < 14.5. We compute ёз ues as below: 


At x=9.5, we find z= 35:719 = AQ ad 
2.24 e 
x 


at х-14.5,че get z= 445 - 42.01 


‘Hence, using Table 9. ж 
Р(10<Х<14)-Р(-0.22<2<2.01) 
= P(-0.22<Z<0) + P(0<Z<2.01) 
= 0.0871 + 0.4778 = 0.5649 
Hence the probability of obtaining heads between 10 and 14 is 0.5649. “ 


Example 9.18 А pair of dice is rolled 180 times. Use the normal approximation method to 
probability that a total of 7 occurs. (i) at least 25 times, (ii) between 33 and 41 times incl 
(iii) exactly 30 times. 


Let X denote the number of times a total of 7 occurs when a pair of dice is rolled. Thes 
binomial variable with 1-180 and р -i (probability of getting a total of 7 with 2 dice). 
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Since n is large and p is not too small, so we use the normal approximation method with 
-180х i - 30, and 


c - пр = АР 
” 6 6 
the desired probabilities. 


3 Тһе interval at least 25 includes 25; therefore it starts at 24,5 and extends to ©, i.e. at least 25 
becomes the interval 24.5 to х. 


24.5- 
The corresponding 2 value is 2 Am. -1.1. 


Hence, using Table 9.2, we find 
P(at least 25) = Y x; 180, 5) р(-11<2<») 
х-25 М 
=Р(-1.1<72<0) + Р(0<2< ©) 
= 0.3643 + 0.5 = 0.8643 ом 
probability of obtaining at least 25 times seven is 0.8 


The interval of discrete values 33« Х<41 Sa by the interval 32.5< Х<41.5, and the 
corresponding z values are: 


, atx- 32.5, £2 op чё 
atx =41.5, aiat 


e using Table 9.2, we figo 
P(33<X<41 P0.5 <2<2.3) 


= P(0<Z<2.3)- P(0<Z<0.5) 


= 0.4893 — 0.1915 = 0.2978. 


The discrete value 30, adjusted for continuity, becomes the interval 29.5 to 30.5, and the 
corresponding z values are: 
at x = 29.5, ; 28220. o., and 


at x= 30.5, z a 40.1. 
e using Table 9.2, we obtain 


PUX=30) = P(-0.1<Z<0.1) 
= 2(0.0398) = 0.0796. 
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9.5.7 'Normal Approximation to the Poisson Distribution. The Poisson distribution p(x; 
also be approximated by the normal distribution when p — о. The probability for a Poison г.у, X 
the value x is 


ETE * 
Р(Х = x) €. E А forx=0, 1,2, ..., © 
х! 


and — E(X) = Var() = n. 


Let pd so that E(Z) = 0 and Var(Z) = 1. 
u 


In order to show that as p — ©, the limiting distribution of Z is standard normal distril 
use the moment generating function. 


Now, the zi.g.f. of Z is 
Ма) -Ее?)- аа | 


=o geht] S 
= е^ e^ u* £L 


х=0 KS 


Sethi on une 
ТАЛДА 


Taking logs to the bass e, we get 
log, M(t) = р + Sages 


us Hm 42 ШЕ d 
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- Гер 


2 


ы P : 
Therefore lim log, M(t) = — or lim M(t) =e"? 
nu n 2! me 


3 the m.g.f. of the standard normal distribution. 
Hence we see the Poisson distribution approaches the normal distribution as р — о. 
Example 9.19 The number of calls received by an office switch board per hour follows a Poisson 


n with parameter 25. Find the probabilities that in one hour (a) there are between 23 and 26 
lusive), (b) more than 30 calls, using the normal approximation hh ави distribution. 


Let X be the ғу, the number of calls received in one hour. 
€ X € 26) and (b) Р(Х>30). 


Using the normal approximation, X is N(25, 25). 
& Р(23<Х<26) becomes on continuous scale P(22. < 26.5). The z-values are: 


Atx 7225, 22 222-25. 05, and 9 


X is p(x; 25). We require 
Qv 


26.5-25 S 
tx = 26.5, 2 2 ———— = 03. 
atx z r 3 EN 


ҳу 
Р(22.5<Х<26.5)- PS. z«03) 
x y <0.5) + Р(0<2<0.3) 
УЗ 0.1915 + 0.1179 = 0.3094 
Р(Х>30) becomes on continuous scale P(X>30.5) 


Р(Х>30.5) = Ax, IM ) = P(Z>1.1) 


= 0.5 — P(0<Z<1.1) 
= 0.5 — 0.3643 = 0.1357 


Fitting a Normal Distribution. There are two possible situations to deal with. 


Given an observed frequency distribution with k classes, we need to calculate frequencies 
por унынын ibution with the same mean and standard deviation as 
given data. f . 
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3 

b) Given only values of mean и and variance o°, we need to determine the number 

(and hence their width) for the presentation of the distribution. 


Case (a). To fit a noirmal distribution to an observed frequency distribution when neither 
nor the variance с? is known, we proceed as below: 


i) We estimate the two parameters н and с> by calculating X and s^ from the 
frequency distribution. 


ii) We calculate the standard normal z-values corresponding to the upper class-bo 
subtracting the estimated mean from each upper class-boundary and dividing by 
normal distribution is defined from ~ о to +, we therefore extend the lowest class 
to—oo and the last (highest) class forward to + ©. 


iii) We find the cumulative probability P(Z«z) = Ф (z) associated with each z-value, 
Tables of standard normal distribution. 


iv) We then obtain the probability, р (or area) for each EN d successive subtraction 
© 


v) We finally get the expected (theoretical) fre ies by multiplying each of 


This procedure is also known as fitting a по istribution to a given frequency distr 


area method. - 


Case (b). To fit a normal distribution чы are given only the values of the two parame" 
в? , we proceed as follows: ©) 


Y 5 
i) We determine the pra a range of the distribution as -30 to р+3 с, since 
normal distributions ctically covered by a range of 6 standard deviations. 


ii) Ме choose all эуез (between 6 and 15 classes) of a convenient width within # 


iii) We proceed ау Case (а) above to calculate the probabilities of the theoretical 
` distribution. The expected frequencies can be obtained when the total number of ob 
(total frequency) is available. 


The following examples illustrate the procedure for fitting a normal distribution: 
Example 9.20 Fit a normal distribution to the following frequency distribution of weigh 
Weight(kg) | 28-31, 32-35, 36-39, 40-43, 44-47, 48-51, 52-55, 56-59, 60-63, 


1 14 56 172 245 263 154 67 23 


We first calculate the mean and standard deviation of this distribution to estimate p 
two parameters of the normal distribution. - 
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Computation of mean and standard deviation. 


c $ 


КЕЗІ) 2 
(317) 4 = 49.5 -1.79 = 47.71 kg, 
1000 С 


pac] Кі 


-4/21652-4хі1. = 8 kg. 


The necessary calculations for ene frequencies of the fitted distribution are shown below: 


Expected | 
frequency 


(pxX.f) 


" Z<z) | Probability 
Ф (2) р 
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Example 9.21 Fit a normal distribution given mean u =60 and standard deviation с =2.5. 


Since the bulk of the normal distribution lies between р-3с and +30 , therefore the 
classes would be 60 + 3(2.5) i.e. 52.5 to 67.5. 


As range = 15, we use 8 classes with a common width Л=2. We then construct the с 
52.5-54.5, 54.5-56.5, ..., 66.5-68.5. Following are the necessary computations: 


upper m 
Classes class T шет 58 
boundary (p ) 


Upto 52.5 
52.5-54.5 
54.5-56.5 
56.5-58.5 
58.5-60:5 
60.5-62.5 
62.5-64.5 
64.5-66.5 
Over 66.5 


If the total frequency were available, we coul 
multiplication of the probabilities by the total frequency 


obtained the expected 


to a specified value of X. For convenience, tes of the standard normal curve at various 
from the mean have been tabulated. Та on page (383) gives the ordinates obtained 
function қ» 


худе 
ке T where z= E, 
Nd 


for different positive valua N? z. Because of symmetry, ordinates at positive values of z 
ordinates at negative values of z. 


We calculate the heights of the ordinates for an observed frequency distribution having k c 
common class-width Л by: 


i) calculating x and 5, as estimates of и and с, from the given distribution; 


9.5.9 Ordinates of Normal кате e ordinate (height) is the value of f(x) co! 


ii) converting the class-marks, х;, into standard normal 2 — values by the relation z = % 


iii) ^ finding the ordinates Ф(2) corresponding to each 2 — value from the Table of Ordin 
iv) multiplying (2) by the factor 2, where л is the total frequency. 


‚ We need the heights of a number of ordinates when we wish to draw the graph of the fitted 
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33 Ordinates of the Normal Curve 0(2) 


2. 
2. 
2. 
2. 
3 
3 


I 


w w w 
ToU t 


0009 
0006 
.0004 
.0003 
0002 


„ыш 
aw 


to tos * 


ым 
io = 


produced from Table Ц of Fisher and Yates: “Statistical Tables for Biological, 
miculturdl and Medical Research, published by Oliver and Boyd, Ltd., Edinburgh, 
permission of the author and publishers". 
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Example 9.22 Find the ordinates of the standard normal curve at (i) z= 0.64, (ii) 
(iii) 2--0.08. 


i) To find the ordinate at z — 0.64, in Table 9.3 on page 383, we move downward 
column marked Z to reach the entry 0.6, and then move across that row to the со! 
0.04 to find entry 0.3251, which is the desired ordinate. 


ii) Similarly, ordinate at z = 2.18 is 0.0371. 
ii) ^ By symmetry, (ordinate at z = —0.08) = (ordinate at z = 0.08) 
t ' = 0.3977 
Example 9.23 Find the ordinates of the frequency distribution of weighs given in 
We have calculated X — 47.71 kg and s — 5.88 kg for the distribution of weights in E; 


^ The procedure for calculating the heights of the ordinates for the given frequency di 
shown in the table below: 


EXERCISES 
OBJECTIVE 3 
а) Answer ‘True’ or ‘False’. If the statement is not true then replace the underlined w 
that make the statement true: 


i) The mean and standard deviation of the exponential distribution are equal. 
-ii) All continuous random variables follow a Normal Probability Distribution. 


er -“кнеккек 5050500 


ous propa HAH StALI24 8. blogspot.com d 


Not all normal distribution can be transformed to a Standard Normal Distribution. 
For any continuous random variable X; Р(Х > 1.0) = P(X > 1.0). 
The area under the normal curve left to its mean is —0.5. 


The normal distribution is symmetrical about zero. 

The mean, median and mode of normal probability distribution are not equal. 
The mean of the standard Z score is one and its standard deviation equal to zero. 
The total area under the curve of any normal distribution is two. 


If the computed ‘value of Z is zero, then the value of normal random variable is greater than 
the mean of this variable. 


The total area under any normal curve is always 0.5. 
Z scores for a standard normal r.v. is set of all whole numbers. 
The second quartile of the standard normal variable is larger than S median. 


Transforming a normal distribution to a standard normal dis: on will not change the mean 
of the distribution. қ; 


The standard score unit is the same as the data unit. © 


TIPLE CHOICE QUESTIONS SS 
The area under the standard normal curv en —3.0 and -2.0 is 


a) 0028 — 9 
5) 04472 AY 
©) .02165 N 

d) 03413 А9 > 


Which is the сһагасїе с of the normal distribution? 


а} Itis bell shaped and symmetric curve. 

5) For any normal r.v. X. P(X <p) =P(X> p). 
©) The total area under the curve is unity. 

Э All of above. 


&3 normal distributions are: 
Symmetrical. 
Having two parameters j and с. 
Bell shaped. 
АП of above. 


386 


іу) 


v) 


vi) 


vii) 


viii) 


xi) 


nps-//Stat224o.DIlQg&pOtiGOITo ТАЛЫ 


For a standard normal probability distribution the mean and standard deviation are- 


^a) p=land o=1 


b) p=0 and o = 

с) p=50 and o = 10 

d) Allof above. 

The middle area under the normal curve with pt2e is 
a) 0.6827 
b). 1.0000 
c) 0.9545 
4) 0.9973 


For а normal distribution with р = 50 and c = 10, how much area will be si 
Х =50? 


а) 035 
S 


b) 0.95 x 
c) 0 } Р : A) М 
d) 0.5 a 


у М] 4 
ша normal distribution, mean devis 1s equal to * 


a) lo Ss * 
b) 080 D 

с) 0.67450 д? 
ау 

) 2.00 сз” 


Тһе по Аво will be less spread out when 

a) The mean is small 

b) The median is small 

c) The mode is small 

d) The standard deviation is small . 

The lifetime of general tires is normally distributed with ап average of 


a standard deviation of 5000 kilometers. The probability that a randomly 
more than 50,000 kilometers is 


a) 0.6789 
v) 0.9772 
c) 0.0228 
d) 0.1600 


https://stat9943.blogspot.com 


OUS PROBABILITY DISTRIBUTIONS 387 


Which of the following statements is correct for standard normal distribution? 
a) P(Z>-2.0)=P(Z> 2.0) 

b) Р(2> 2.0) =P(Z <2.0) 

c) Р(2>-2.0) =P(Z<-2.0) 

d) All of above. 


CTIVE 
Find the mean, variance and m.g.f. of the uniform distribution. 


x Find the moment genérating function and the first four moments of the rectangular 
distribution on (-1/2, 1/2). 


Describe an exponential distribution and derive its mean and standard deviation. 


Suppose the average length of life of a colour television tube is 12 months. What is the 
probability that the length of life. is equal to or greater ar 18 months? Assume ап 
exponential distribution. Мы 


“If X has an exponential distribution given by 


үзе” 
faze, 45% 
what are the mean, variance and m.g.f. of XR calculate Р(Х>3) and Р(Х>5 | X22). 


0<х<. 


The distance, x, in kilometers tray, у customers to the “Cheap Supermarket” are 
distributed with the density functione | 
1 д> e 
fe)-scem “ 0<х<. 
N 
elsewhere. 
Find the proporti customers travelling less than 1 kilometer and the proportion 


travelling more than Ё5 kilometers to the Super market. 
Show that the standard deviation of the density function f(x) = ae ^, where x takes all 
values from 0 to es. 
Write down the zr... for the distribution given by 
fa)» ie, б<х<в. 
Derive the first four moments about the mean. 
f(x)2xe*, 0<х< >. 


=0, otherwise. 


first four moment using moment generating function. (P.U., В.А. (Hons.) Part-II, 1963) 
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a Show that for the éxpadeutia] distribution. 


Yo being constant, the mean and the standard deviation are each equal to c and 
interquartile range is с log, 3. Also find u’ and show that p, =4 and p, = 9. 


Ms 


b) 


dy = уе "ах, 


Let X have probability density 
"lg 
f(x) 25 


Find the expectation and variance of X: 


The density function of a ramdom variable X is given by 


joz 
e* +e 


Find (i) the constant a, (ii) the probability that in rcd observations 


on values less than 1. 


M 
The Gamma distribution is given by S 
qv 
ЧА 1 -x „m-l ^ 
Ea БХ) 
Г(т) e 
уо 


where Г(т) «fe 6% 


A 


Show that the I the variance are each е 
mean is 2m. 


Find the e Gamma distribution. 


Find the rth moment of the Gamma distribution ( 
m.g.f. 


Show that the sum of two independent Gamma variables with parameters m 


Gamma variable with parameter m + n. 
The Beta distribution of the fist kind is given by 


x" (1- x)", 


fo ^x 


where B(/, т) = (ға = x)" dx. 


Show that the mean is m and the variance is 
m 


(Lm) (1+ m1) 


D 


0<х< о, с>0, 


(Р.О.,В.А 


-%9<х<ш. 


-ю<х<о. 


` (P.U., В.А. (Hons.) 


qual to т and the third mo; 


i) without using ће т.р./, (ii) 


0<х<1, 


Im 


ттмиоов ваовлВ ӘБ 64919943. blogspot.com 


Find the harmonic mean of the Beta distribution. 
\ (P.U., B.A. (Hons.) Part-III, 1967) 


Defining a ү (m) variate as one with a probability density function оҒ the form 


x"'e 
T(m) 


fe , forx2:0, m being a “уе constant. ` 


Obtain the distribution of T where X and Y are independent y (m) апа y (m) variates 


respectively. 


IUS ince 


Show that the mean values of the positive square root ofa y (I) variate is TO 


prove that the mean deviation of normal variate from its mean is ov2/n. 
(P.U., B.A. (Hons.) Part-III, 1966) 
2) Define the Normal Distribution and obtain its mean and variance. 


5) Show that for the normal distribution, the mean, mode and are the same. 
© (B.Z.U., В.А./В.ӛс. 1976) 


State the mathematical equation of the Normal distribution өзе that 


9 the area under the normal curve is unity; e 


=) һе normal curve has points of inflection whi equidistant from the mean; 
5 бу» (P.U., В.А./В.ӛс. 1986, 88, 93) 
x) all odd order moments about mean > (P.U., В.А /В.5с. 1973) 


The continuous random variable X has г Кх), where 


аа Эл әле. Ію<х<о 
с : = 
NS | ees 
Derive the mean өлуде variance of the distribution. 


Show that the maximum value of f(x) occurs at x- р. 


Show that there are points of inflection at x=u -- c andx=p-o. 


If f(x) = ke C 59 is the equation of a normal curve, find the value of k, the mean and 


standard deviation. E (P.U., В.А./В.ӛс. 1984) 
Show that for the normal distribution, the mean deviation from the mean is approximately 
i of its standard deviation. ; , (ФО,,В.А/В.5с. 1983, 87) 


Sove that, for the normal distribution, the quartile deviation, the mean deviation and the standard 
deviation are approximately in the ratio 10:12:15. (B.Z.U., B.AJB.Sc. 1990) 


А 
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9.17 


9.19 


9.20 


9.21 


9.23 


a) Obtain the moment generating function of the standardized normal distribution. 


b) Show that for the normal distribution, moments of odd order about mean are all 
moments of even order are given by 


42) (ny 


2n 


a) Show that the m.g-f- of the normal distribution with mean и are variance 6^, is 
MAD = etto 
b) What percentage of the normal distribution with mean p- and variance o° i 


between the points (i) и gad (р +1.540 ), (ii) (u—1.730) and (u *0.566 )? 
(P.U., В.А. 
a)  IfXis normally distributed with zero mean and с =0.6, find Р(Х>0) and P(0.2<% 
b) Бога normal distribution with mean 1 and standard devágtion 3, find the probabilitz 


i) Р(3.43< X < 6.19), ii) Р(-1.43< X 19): 
A normal distribution has mean = 12 and c —2, find (зге under the curve 


а) from Х=10 to X=13.5, b) from X=11. 14.2, с) from X=6 to X=18. 
Let X be М100, 225). Find the following ihities: 

a) Р(Х<92.5), b) PC 0507.5), с) Р(Х> 124),, 

d) Р(112<Х<128.5), е) Х<127), f) Р(Х>76). 


a) IfXisN(u,c*) and if qe b, then show that Y is N(a p +b; a^ o? ). 

b) Let Y=5X+10 and A normally distributed with a mean 10 and variance 25. 
following: S ы. 
i) P(Y <54 қ (ii) P(Y2 68), (iii) P(52< У<67). (B.ZU., B.AJ/B. 

a) Scores on a certain nation-wide college entrance examination follow a normal ds 
with a mean of 500 and a standard deviation of 100. Find the probability that a 
score (i) over 650, (ii) less than 250, and (iii) between 325 and 675. 

b) Given that the height of college boys is normally distributed with mean 5'—2” 
deviation 4", and that the minimum height required for joining the N.C.C. is 5° 
the percentage of boys who would be rejected on account of their height. 


a) - If the heights (X) of college students are normally distributed with mean 69 and 
find the probability that (i) Х<65 and (ii) 65 < Х<70. 


b)  Ifthemgf оҒХіз М()- е" find 
P(-4 € Х<1б) and Р(-10<Х<0). 


https://stat9943.blogspot.com 


CONTINUOUS PROBABILITY DISTRIBUTIONS ` 391 


25 Suppose that weights of 2000 male students are normally distributed with mean 155 pounds and 
standard deviation 20 pounds. Find the number of students with weights (1) less than or equal to 
100 pounds, (ii) between 120 and 130 pounds, (iii) between 150 and 175 pounds, (iv) greater than 
or equal to 200 pounds. 


26 a) Тһе mean life of stockings used by an army was 40 days, with а standard deviation of 8 
days. Assume the life of the stockings follows a normal distribution. If 100,000 pairs are 
issued, how many would need replacement before 35 days? After 46 days? 

(B.Z.U., B.AJB.Sc. 1990) 


b) Тһе time taken by a milkman to deliver milk to the GOR Estate is normally distributed with 

( mean 12 minutes and standard deviation 2 minutes. He delivers milk everyday. Estimate the 

^l 20 number of days during the year when he takes (i) longer than 17 minutes, (її) less than 10 
minutes, (iii) betwéen 9 and 13 minutes. 


Тһе scores made by candidates in a certain test are normally distributed with mean 500 and 
standard deviation 100; 


a) What percent of the candidates received scores (i) greater 700, (ii) less than 400, 
(iii) between 400 and 600, (iv) which differ from mean by meN an 150? 
СУ) whi xa. fran méan оз m m eu 1S O v 
b) Ifa candidate gets a score of 680, what percent of the сард tes have higher scores than he? 
N) (P.U., В.А./В.5с. 1980) 


^ 


A man goes by car to his office, and the route throu; ity centre takes him, on the average, 27 


minutes with a standard deviation of 5 minutes. е opening of a new ring road, the man can 
bypass the congestion of the city centre, but the ey now takes, on the average, 29 minutes with 
2 standard deviation of 2 minutes. Assumj at both journey times are normally distributed, 
determine which route is the better one 154 пап has (i) 28 minutes, and (ii) 32 minutes to reach 


kis office for an appointment. X, (LU., M.Sc. 1990) 
Hint. The better route is one es the smaller probability of the man's being late for 


appointment. CA 
=) A random variable XR 16). If P(X > a) = 0.5636, find the value of a. 
= А possible measure’ of kurtosis (i.e. flatness) is given by k = QD: ‚ Where Q.D. is the 
90 ^ "10 


Semi-Interquartile range, and P's are the percentiles. Use the standard normal table to 
estimate the value of k for a normal distribution. 


з normal distribution with 1=47.6 and с =16.2, find (i) the probability that a single observation 
be larger than 50, (ii) two points such thai a single observation has a 9796 probability of falling 
n them, (iii) Pio, Рзо and Pog. (P.U. В.А./В.5с. 1976) 


А random sample of 1,000 iron rods are tested for their length and it is found that the mean 
and the standard deviation are 14.40 metres and 2.50 metres respectively. If the lengths of 
the rods are normally distributed, then find (i) how many rods will be between 12 and 16 
metres? (ii) what are the chances that any rod selected at random will be 15 metres length or 
above? 


392 


9f 932 


9.33 


аф 9.36 


.b) Тһе mean score of 1000 students appearing for an examination is 34.4 and the s 
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deviation is 16.6. How many candidates may be expected to obtain marks between 30 
60 assuming the normality of the distribution? Under the same assumptions, determine 
the limits of marks of the central 70% of the candidates. 


А soft drink machine is regulated so that it discharges an average of 200 milliliters per cup. 
amount of drink is normally distributed with a standard deviation equal to 15 millimeters, 


a) what fraction of the cups will contain more than 240 milliliters? 
b)  whatis the probability that a cup contains between 191 and 209 milliliters? 
c) how many cups will likely overflow if 230 milliliters cups are used for the next 1000 


d) below what value do we get the smallest 25% of the drinks? 
(P.U., B.AJB,Sc. 


а) The heights of applicants to the police force are normally distributed with mean 170 
standard deviation 3.8cm. If 30% of applicants are rejected because mer are too small 
is the minimum acceptable height for the police force? 5% 


b) The average life of a certairetype of small moto: e years with standard devi 
years. The manufacturer replaces free all motors fiat fail while under guarantee. 
willing to replace only 3% of the motors t il, how long a guarantee should 
Assume that the lives of the motors d dona distribution. 


An architect is designing the interior doar men’s gymnasium. He wants to make 
enough so that 95 percent of the me: g the doors will have at least a one-foot 
Assuming that the heights will be y distributed, with a mean of 70 inches and & 
deviation of 3 inches, how high пт е architect make the doors? (P.U., В.А.В, 


a) Ina normal Mes lower and upper quartiles are respectively 8 and 17 
mean and standard e tion of the normal distribution. (B.LS.E., Lal 


b) Ina normal on, 31% of the items are under 45 and 8% are over 64. Find 8 
and standar ation of the distribution. 


c) Let X be normally distributed with mean p and variance o°, so that P(X«89) 
Р(Х>94) = 0.05. Find р and c? . (P.U., B. 


Assuming that the number of marks scored by a candidate is normally distributed, 
and the standard deviation, if the number of first class students (60% or more 
number of failed students (less than 30% marks) is 90 and the total number of candi 
for the examination is 450. 


А boy is trying to climb a slippery pole and finds that he can climb to a height of at 
once in five attempts, and to a height of at least 1.70m nine times out of ten att 
that the heights he can reach in various attempts form a normal distribution, calculate $ 
standard deviation of the distribution. Calculate also the heights that the boy can 
once in one thousand attempts. (LU., 
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Prove that the Binomial distribution (g+p)" tends to become a normal distribution for large 
values of n. 


Supposing that the death rate from Malaria is 20%, find the probability that the number of 
deaths in a particular village is between 70 and 80 out of 500. 
(P.U., B.Sc. Hons. Part-II, 1972) 


Explain the reason why the correction for continuity is usually made when we apply the 
normal approximation to the binomial distribution. 


Find the probability that 200 tosses of a fair coin will result in (i) between 80 and 120 heads 


inclusive, (ii) less than 90 heads, and (iii) exactly 100 heads. (P.U., B.A./B.Sc. 1977) 
A coin is tossed 200 times. Find the probability of getting (i) between 105 and 110 heads 
inclusive, (ii) less than 95 heads. (P.U., В.А./В.ӛс. 1983) 


Find the probability of obtaining between six and nine heads inclusive in 15 tosses of an 
ideal coin by applying (i) the binomial distribution, (ii) the normal approximation to the 
binomial, with correction for continuity. о (P.U., В.А /В.$с. 1978) 


А telephone exchange receives, on average, 5 calls рег тий. Find the probability that in a 
20-minute period no more than 102 calls are received. X e 

If X is b(x; 20, 0.4) find P(6 € X € 10). Then find oximations to this probability using 
(i) the Poisson distribution, (ii) the normal ази $% 

Find the ordinates of the normal cu ж (i) 2=0.064, (ii) z=1.27, (їп) 2=0.84 апа 
(іу) 2--2.08. en 

Fit a normal distribution given 1—27.0, standard deviation c —2.2 and total 
frequency=209. w 


e 
The following table gives the dition of statures among the first year students of a university: 
Stature (їп.): 61 62 Na 65 66 67 68 69 70 71 72 73 74 
Frequency: 2 iu 38..57. 93 106. 126 1090 87 75 23 3 4 


=) Test the normality of the distribution by comparing the proportion of the cases lying between 
X+s, Х + 25, Х ® 3з for the distribution and for the normal distribution. Е 


Ы Fit a normal distribution to the data, using area Tables. 


Fit a normal distribution by area method to the following data. Also calculate ordinates. 


Compare actual and expected frequencies on a graph. 
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9.45 Calculate the frequencies of the normal distribution which has the same total been 
standard deviation as the following distribution ў 


%9%%%9999%99 
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.1 INTRODUCTION 


The term regression was introduced by the English biometrician, Sir Francis Galton (1822-1911) 
describe a phenomenon which he observed in analyzing the heights of children and their parents. He 
d that, though tall parents һауе tall children and short parents have short children, the average height 
children tends to step back or to regress toward the average height of all men. This tendency toward 
average height of all men was called a regression by Galton. 


Today, the word regression is used in a quite different sense. It investigates the dependence of one 
iable, conventionally called the dependent variable, on опе or more other variables, called 
pendent variables, and provides an equation to be used for estimating or predicting the average value 
the dependent variable from the known values of the independent variable. The dependent variable is 
d to be a random variable whereas the independent variables are assumed to have fixed values, i.e. 
у are chosen non-randomly. The relation between the expected value of the dent variable 

ndent variable is called ыы ыт асы de dee ofa velie one 


е independent variable, it is called a simple or two-variable КА When the dependence of a. . 


ble on two or more than two independent variables is studie 15 called multiple regression. 
rmore, when the dependence is represented by a straight ling uation, the regression is said to be 
; otherwise it is said to be curvilinear. N 


It is relevant to note that in regression study, a rai vos variation we try to explain is a 
dent variable while an independent variable is a vai: that is used to explain the variation in the 
nt variable. by 


e 
Some more terminology: The dependent oh is also called the regressand, the predictand, the 
е огїһе explained variable whereas th pendent or the non-random variable is also referred to 
regressor, the predictor, the regressigiwariable or the explanatory variable. 


© | 
DETERMINISTIC AND PROBABILISTIC RELATIONS OR MODELS 


S 
The relationship among les may or may not be governed by an exact physical law. For 
ience, let us consider of п pairs of observation (X, Y;). If the relation between the variables is 
iy linear, then the mathematical equation describing the linear relation is generally written as 
Y,7a-bX, 


a is the value of Y when X equals zero and is called the Y-intercept, and b indicates the change in Y 
efe-unit change in X and is called the slope of the line. Substituting a value for X in the equation, we 
completely determine а unique value of Y. The linear relation in such a case is said to be а 
inistic model, An important example of the deterministic model is the relationship between Celsius 


enheit scales in the form of F =32 +2c ‚ Another example is the area of a circle expressed by 


igh, area = nr. Such relations cannot be studied by regression. 


In contrast 19 the above, the linear relationship in some situations is not exact. For example, we 
precisely determine a person’s weight from his height as the relationship between them ig not 
to follow an exact linear form. The weights for given values of age are reasonably assumed to 
measurement of random errors. The deterministic relation in such cases is then modified to allow 
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for the inexact relationship between the variables and we get what is called а non-determina 
probabilistic model as 


¥,=a+bX,+e, (i=1,2,....n) 


where e;'s are the unknown random errors. 


10.3 SCATTER DIAGRAM 


A first step in finding whether or not a relationship between two variables exists, is to 
pair of independent-dependent observations ((X; У), i= 1, 2, ..., п} as a point on graph paper, 
X-axis for the regression variable and the Y-axis for the dependent variable. Such a diagram is 
scatter diagram or a scatter plot. If a relationship between the variables exists, then the points 
scatter diagram will show a tendency to cluster around a straight line or some curve. Such a line 
around which the points cluster, is called the regression line or regression curve which can be 
estimate the expected value of the random variable Y from the values of the nonrandom variable X- 


The scatter diagrams shown below reveal that the relationship between two variable $ 
positive and linear, in (b) is negative and linear, in (c) is curvilinear and in (d) there is no relati: 


10.4 SIMPLE LINEAR REGRESSION MODEL 


We assume that the linear relationship between the dependent variable Y, and the value 
regressor X is 


¥,=a+BX,+e,, 
where the А, 5 are fixed or predetermined values, 
the У, 5 are observations randomly drawn from a population, 
the £,'s are error components or random deviations, 
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a and В are population parameters, a is the intercept and the slope В is called regression 
coefficient, which may be positive or negative depending upon the direction of the relationship 
between X and Y. ; 


Furthermore, we assume that 


` 


0 E&(e,)=0, i.e. the expected value of error term is zero, it implies that the expected value of Y is 
related to X in the population by a straight line; 


Var(e,) = E(é;7) =o? for all i, i.e. the variance of error term is constant. It means that the 
distribution of error has the same variance for all values of X. (Homoscedasticity assumption); 


E(é;, € 4) =0 for all i+ j, ie. error terms are independent of each other (assumption of no 
serial or auto correlation between &' s y 


E(X, &,) = 0, ѓе. Xand є are also independent of each other; 


£,'s are normally distributed with a mean of zero and a const ariance c^. This implies 
that Y values are also normally distributed. The distributions and є are identical except 
that they have different means. This assumption is require for estimation and testing of 
hypothesis on linear regression. 

According to this population regression model, each Y, is CES from a normal distribution 
=a +ß X and variance =o” . Thus the relation expressed alternatively as 


En- anny, 


lies that the expected value of Y is шор related to X and the observed value of Y deviates 
line E(Y) =о +В X by a random со t £, ie в) = Y, - (a * B.X)). The following graph 
the assumed line, giving E(Y) оқұ? given values of X. ^ 


E(Y)- о-фх 


0 X1 X2 X3 
ті practice, we have a sample from some population, therefore we desire to estimate the 
regression line from the sample data, Then the basic relation in terms of sample data may be 


Y; =a + bX; + e, 


В and e; аге the estimates of œ, B and є,. The estimated regression is generally written as 
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Many possible regression lines could be fitted to the sampie data, but we choose that parti 
which best fits that data. The best regression line is obtained by estimating the regression рап 
the most commonly used method of least squares which we describe in the following subsects 


10.4.1 An Aside—The Principle of Least Squares. The principle of least squares 
of determining the values of the unknown parameters that will minimize the sum of squares of 
residuals) where errors are defined as the differences between observed values and the co 
values predicted or estimated by the fitted Model equation. 


Thé parameter values thus determined, will give the /east sum of the squares of e 
known as least squares estimates. The method of least squares that gets its name from the 
of a sum of squared deviations is attributed to Karl F. Gauss (1777-1855). Some people Бей 
method was discovered at the same time by Adrien М. Legendre (1752-1833), Piere 
(1749-1827) and others. Markov’s name is also mentioned in connection with its further deve 
recent years, efforts have been made to find better methods of fitting but the least squares те 
dominant and is used as one of the important methods of estimating the population parameters 


10.4.2 Least-Squares Estimates in Simple Linear Regression. Let there be a set of 
(X, Y), 1= 1,2, ...„п}, 
Yi= a + BXi- 42 


Yi=a ч 


where а and b are the least-squares estimat ANS and В, е, commonly called residual, is 
of the observed У, from its estimate provideds? Y; = a*bX,. 

According to the principle of. -squares, we determine those values of a and 8 
minimize the sum of squares of th: duals. In other words, the best regression line is the 
minimizes the sum of the sq f the vertical deviations between the observed values 
corresponding values predicte the regression model, i.e. Y=a+bX;. That is the least 
minimizes 


S(a, b) = Deas S. - yy 
із! ізі 


-X(Y -a-bX,y 


or in terms of sample data as 


Y| 


Ё e-(Y,-« - bX;) 
Ys, 


As a and b, the two quantities that determine the line, 
vary, S(a, b) will vary too. We therefore consider 
S(a, b) as a function of a and b, and we wish to 
determine at what values of a and b, it will be 
minimum. 


Xi 


Minimizing S(a, b), we need to set its partial derivatives w.r.t.a and b equal to zero. 
ð 0 S(a, b) 
E 
д S(a, b) 
š ôb 


») IEY; -a-bX;) (-1) =0, and 


72X(Y,-a-bX;) (-Х;)-0 
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Simplifying, we obtain the following.two equations, called the normal equations (the word normal 
here in the sense of regular or standard). 


XY,-na-bY X, and Y Х,У, УХ, £bXX) 


These two normal equations are solved simultaneously for values of a and b either by direct 
tion or by using déterminants. 


9 Direct Elimination: Multiplying the first equation by X X, and the second equation by n, we 
get X XX Y7naX X -b(XAXyandnEXY-naXX-nbY, X 


Subtracting, we get 
nEXY-EXXY-b[nEX -(XXy] 


QaLXY-LXXY LX- -X)t- Y) 
лух (УХ) хх-ху 


5 the least-squares estimate of the regression co-efficient В. 


Smilarly, we get SS 


-ХАЗХУ-ХХУХҮ ev 
пу Х* -ZXY RX 
squares estimate of с. SS? 


atively, we divide the first normal La pd n, and get the least-squares estimate of a as 
e bX. 


also shows that the estimated ге on line passes through ( X , Y ), the means of the data. 
By means of determinants, D olution is 


zx S 
EY ХУ) sxav-(Ex)(m») X 
5 2] nxX'-(xxy ^ 


>Х п 
82 ы 

«ЕХ EY | (®Х?)У(®Ү)-(®Х) EA) 
xS M nLX?-(xxy i 
xx 


estimates give us the regression equation 
Y, =а+ЬХ, 
-Y-b(X-X) 
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nd XY -(2X) (ZY) 
n£ xX’ -xP 


Since Y is a random variable, therefore the deviations in the Y direction are taken into 
determining the best-fitting line. 


where b= 


It is very important to note that, when both X and Y are observed at random, i.e. the si 
are from a bivariate population, there are two regression equations, each obtained by ci 
variable as dependent whose average value is to be estimated and treating the other 
independent. In case of a single random variable, the single regression equation is used to 
values of either the dependent or the independent variable. In case of two regression lines, it is 
to denote the regression coefficients of Y on X and of X on Y by b,, and bw respectively. 


Example 10.1 Compute the least squares regression equation of Y on X for the fo 
What is the regression coefficient and what does it mean? 
IE gu TEGDESGmOU БІЛУЛЕРІНЕ ONE 
peuyc[u16::21975:23. 282 26: 41^ 44 — 45- 250 


The estimated regression line of Y on X is 


P=a+bx, $ 
and the two normal equations are кы 
EY-na-bXX EM 


XXY-aXX-bLEX 


.nEXY-(EX)(XY) 9(3853)-(102) (302) ` 
nXLX?-(Xxy 9(1308) - (102)? 


— 34677-30804 3873 


—— = 2.831, and 
711772-10404 1368 
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“ а= Ү-ЬХ -33.56- (2.831) (11.33) = 147. 
the desired estimated regression line of Y on X is 


Y 2147* 2831X. 


The estimated regression co-efficient, b = 2.831, which indicates that the values of Y increase by 
mits for a unit increase in X. 


Example 10.2 In an experiment to measure the stiffness of a spring, the length of the spring under 
loads was measured as follows: 


-Іоа4(1)| 3 5 6 9 10 12 15 20 22 28 
Y-length (in) 10 12 15 18 20 22, 27 30 32 34 


Find the regression equations appropriate for predicting 
the length, given the weight on the spring; 
the weight, given the length of the spring. (W.P.C.S., 1964) 


The data come from a bivariate population, i.e. both X and Y are m, therefore there are two 
lines. To find the regression equation for predicting Іепрў Р), we take Y as dependent 
and treat X as independent variable (i.e. non-random). For фезесолд regression, the choice of 


The estimated regression equation appropriate for predicting the length, Y, given the weight X, 
E 


У =aptbyX, `+ 
nE XY -(ZX) (ZY) (10) (3467) - (130) (220) 
x!-(xxy (10) (2288) - (130)* 


2 60107. оола ? 
5980 


а= YEUX, = 22 - (1.02) (13) = 8.74 


where by. = 
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Hence the desired estimated regression equation is 


< Y -874* 102 X 
ii) The estimated regression equation appropriate for predicting the weight, X, given the 


n =а+ byy Y, 
ni ХҮ - (УХ) (ХУ) (10) (3467) – (130) (220) 
where b yy UE 
ОЮ -(XY) (10) (5486) — (220) 
= me = 0.94, and 
6460. 


а-Х- by Y = 13 — (0.94) (22) --7.68 


Hence X =0.94Y —7.68 is the estimated regression equation appropriate for predicting the 
given the length (У). 


10.4.3 Properties of the Least-Squares Regression Line. Bre least-squares linear 
has the following properties: Б 2 
by f x 


i) The least squares regression line always goes Sh the point ( X , Y ), the means 

ii) The sum of the deviations of the observed, Жолы of Y, from the least squares ге 
always equal to zero, i.e. X (Y, — Y у= e 

ii) The sum of the squares of поо of the observed values from the 

regression line is a minimu: ‚ L(Y, - Y,)? = minimum. 


iv) Тһе least-squares тергей line obtained from а random sample is the line of best 
a and б are the unbi estimates of the parameters о and фр. 


10.4.4 Standard Sion of Regression or Standard Error of Estimate. The ob 
of (X, Ү) do not all fall дп the regression line but they scatter away from it. The degree 
(or dispersion) of the observed values about the regression line is measured by what is called the 
deviation of regression or the standard error of estimate of Y on X. For the population data, 
deviation that measures the variation of observations about the true regression line Е(У)- 


denoted by ©, y and is defined by 
өүү = 66+ BOF 
у= N 


For sample data, we estimate c, Бу s,., which is defined as 


К Sp EXTRA йу 
HOC n-2 


where N is the population size. 
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f-a-bX, the estimated regression line. This is actually an unbiased estimate of cy y, the 
оп standard deviation about the regression line. The standard error of estimate, s,, will be zero 
ail the observed values fall on the regression line. It is interesting to note that the ranges 
Yt 2s,, and f+ 3s, , contain about 68%, 95.4% and 99.7% observations respectively. 


To find yw - ў)?, we һауе to-calculate Y from the estimated regression line for the observed 
ef X, which is not an easy task. We therefore use an alternative form obtained аў below: 


HPP =D (К -a -bX 
= Уу -a-bx)) -aY 0 -a-bX,)-bY ХИ -a- bj 
-Y n -aX Y, -b9 XY, -a Y, -na -b9 X] - M3 X), - ay Xi -bY XT] 


УУ, -па-Ьў X, =0 апа Y XY, -a x, - 5 x; — 0 as they are the normal equations. 


DL - y - XY? -a9 Y, - b X, SS 


„ү бле, d a 


ss the number of pairs. xS. 
ple 10.3 Using the data in Example қр» 


find the values of Y and show that Ж -УҮ)-0, and 
compute the standard error oae #@ў КЕ $. г? 


calculations needed to Sad values of Ӯ and the standard error of estimate s,, are given in 
below: 


0.140625 
0.295936 
1.249924 
3.168400 


0.311364 
7.436529 
0.004225 
3.118756 
0.162409 


e [wr eum | тан somma [Tr] 
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i) The estimated values Ў appear in the third column of the table on page 403, 
turns out to be 0.008. This small difference is due to rounding off. 


ii) The standard error of estimate of YonXis 


— 
Mouse заны, 2269738 -151 


Using the alternative form for the calculation of 5, ,, we get 


. [ZY?-axY-bXXY 


s 
e n-2 


3 [11368 - (1.47302) - (2.831)3853) 
9-2 . 


= zi = 4/2.316714 21.52, т " 
| © 
10.4.5 Co-efficient of Determination. The чазы отр the values of the 
Y, called the total variation, is given by X(Y - YY. is composed of two parts (i) that 
explained by (associated with) the regression line, yk” -F)°, (ii) that which the regression 
to explain, ie. Ұ(У- yy (see figure). In эп 


xq -Yy =D -y aae 


Total variation — ерке ры + Explained variation 
YIN 4. 


The co-efficient of determination which measures the proportion of variability of the val 
dependent variable (Y) explained by its linear relation with the independent variable (X), is defi 
fatio of the explained variation to the total variation. We use the symbol p° for the population 
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symbol r° for the estimate obtained from sample. Thus the sample co-efficient of determination is 
by : 
54-5 Explained variation _ rt- yy 

Totalvariatión (Y-F)? 
.420- 7 

xr-Yy.* 
tive form for calculating the coefficient of determination is 
dg aXY-«bYXY-(XY)/n 
EY- n 

When all the observed values fall on the regression line, then Y = Y and x(-Yy)- LY «xy 
hence 7? -1. When the observed values are such that Y=Y, then (Ў -Yy -0, and hence 
0. This shows that 0€ r? <1. A value of r? =1, signifies that 1 of thé variability in the 
t variable is associated with the regression equation. When қ , it means that none of the 


ity in the dependent variable is explained by X-variable. A уф ог г? = 0.93, indicates that 93% 
variability i in Y is explained by its linear relationship with sif ondependent variable X and 795 of the 
is due to chance or other factors. 


Example 10.4 Taking length (Y) as dependent aN: for the data in Example 10.2, calculate 
zotal variation, (ii) the unexplained variation, er explained variation, and (iv) the co-efficient 
tion and interpret the coefficient. 


In Example 10.2, we found that S 
LY = 220, EY? = 5486, 5 b=1.02, a 2 8.74 and n =10. 
We now find 
Total variation = ize == Y! -(ZY) /n 
-5486- (20У10- -646 
Unexplained variation = X(Y Ў)? -XY?-aXY-bXY XY 
= 5486 – (8.74) (220) — (1.02) (3467) 
= 5486 — 5459.14 = 26.86 
Explained variatión =Total variation — unexplained variation 
= 646 26.86 = 619.14 | 


The coefficient of determination, 72, is given by 


РУ Explained variation _ Уау 
Totalvariation — X(Y-y)? 
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Ж 619.14 0.958 


A value of r? = 0.958 indicates that 95.8% of the variability in Y, the length of the spri 
demonstrated by its linear relationship with X, the weight on the spring. 


10.5 CORRELATION 


Correlation, like covariance, is a measure of the degree to which any two variables vary t 
In other words, two variables are said to be correlated if they tend to simultaneously vary in 
direction. If both the variables tend to increase (or decrease) together, the correlation is said to be 
or positive, e.g. the length of an iron bar will increase as the temperature increases. If one variable 
tó increase as the other variable decreases, the correlation is said to be negative or inverse, 
volume of gas will decrease as the pressure increases. It is worth remarking that in correlation, we 
the strength of the relationship (or interdependence) between two variables; both the variables are 
variables, and they are treated symmetrically, i.e, there is no distinction between dep: 
independent variable. In regression, by contrast, we are interested in determining the dependence 
variable that is random, upon the other variable that is non-random or fixed, and in predicting the 
value of the dependent variable by using the known values of the od rabie 


10.5.1. Pearson Product Moment Correlation Co-efficignt. A numerical measure of 

- the /inear relationship between any two variables is calle Pearson's product moment 
co-efficient or sometimes, the coefficient of simple co an or total correlation. The sample 
correlation coefficient for п pairs of observations (X; IQ lly denoted by the letter ғ, is de 


XX-X)Y-Y) 4^Q 


‘ps ° t 
Jax -Xy татту 


The population correlation co-effiil for a bivariate distribution, denoted by р, has 


defined as ` O 
€ Cov( A 
Fogger 
For computational Poses, we have an alternative form of r as 
ЕР EXY-(EX)(EY)n 5 
JEX -EX mr! -E m] 
п>ХҮ-ХХХУ 


dnzx'-ExYIEIY-GY 


This is a more convenient and useful form, especially when X and Y are not in 
coefficient of correlation r is a pure number (i.e. independent of the units in which the vari 
measured) and it assumes values that can range from *1 for perfect positive linear relationship, 
perfect negative linear relationship with the intermediate value of zero indicating no linear re 
between X and Y. The sign of r indicates the direction of the relationship or correlation. 
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It is important to note that r = 0 does not mean that there is no relationship at all. For example, if all 
observed values lie exactly on a circle, there is a perfect non-linear relationship between the variables 
r will have a value of zero as ғ only measures the linear correlation. 


The linear correlation co-efficient, is also the square root of the linear co-efficient of determination, > 


Y =Y +b(X-X) 
or Y-Y -kx-X) 
Squaring both sides, we get x 
(Y-Yy-P-xy 


Substituting in the ratio, we find 


УУ-У): BPxu-Xy 
ЖҮЗҮ os (У-У)? SS 


a МР с” 
xac -xy monu Dr- S 
Ly -¥)? У(Х - 
“(xX - DS 
XY- p. 


Example 10.5 Calculate the EE co-efficient of correlation between X and Y from the 
data: 


(P.U., В.А./В.ӛс. 1973) 


r.a =D) БӨРТЕС 


Iu-Xyxc-Y) 00526 161 


Alternatively, the following table is set up for calculation of r. 


LXY-(XY)(XY)n e 


us Xs 
VEX otim -E qs 
88- (15) 25/5 S. 

PZE (15) / 5] [151- А 10x26 - 


10.5.2 Correlation and Causation А ct that correlation exists between two 
о 


not imply any cause-and-effect relationshi unrelated variables such as the sale of bananas. 
death rate from cancer in a city, may ce a high positive correlation which may be due 
unknown variable (namely, the ci! lation). The larger the city, the more consumption of 
and the higher will be the death m cancer. Clearly, this is a false or a merely incidental 
which is the result of a third ible, the city size. Such a false correlation between two 
variables is called nonsens urious correlation. We therefore should be very careful in i 
the correlation coefficienng n measure of relationship or interdependence between two variables. 


10.5.3 Properties of r. The sample correlation co-efficient r has the following properties- 
i Тһе correlation co-efficient ғ is symmetrical with respect to the variables X and Y, ie 
ii) Тһе correlation co-efficient lies between -1 and +1, i.e. -1 rS 1. 
iii) Тһе correlation co-efficient is independent of the origin and scale. | 


and у= rh so that X =a 


Proof: Let u and v be the two new variables defined by u = 526 
Y = b + kv, where а and b are the new origins and / and К are the units of measurement. 


Let ғұу denote the correlation co-efficient between X and Y and r,, the correlation 
between и and v. 


Substituting these values in гуу; viz. 
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X -3)g-v) els 
У(Х -X) xu -ry 


La + hu) - (a -hu)] .X[(b + kv) - (b - kv) 
where X -a «hu and Y - b kv , Therefore 


мухи). (у-у)? 


This property is very useful in numerical evaluation of 7 since due to this property, we can choose 
ient origin and scale. 


zw 


In case of a bivariate population where both X and Y are random variables, ғ is the geometric 
mean between the two regression co-efficient. 


That is, if Б, is the regression coefficient of the regression line 2 on X and by is the regression 
of the regression line of X on Y, and r is the coefficient eebrrelation, then т^ = 6,,.b,, implies 


ә ЕЛІ M 


2 o 2 ba 
Since the signs of the regression coefficients d № the same expression У (Y- X ) (Y-Y ) so 
and by are both positive or by, and byy are ыс gative. Therefore 


к= «b, b, if ba and by «еміне, 
к= -fb by, if b, and by De negative. 
value of r always takes the s gn as the regression coefficients. 


the regression lines for a bivariate population, by using the 


The regression co-efficients 
vent, may be expressed as 


of the correlation co- 


N 


cage EE m D 
Y-Y =r—(X-X); and X-X =r—(Y-Y), 
5, 5, 


letters have their usual meaning. 


Example 10.6 Calculate the co-efficient of correlation between the values of X and Y given below: 


125 137 156 112 107 136 123 108 


и = X — 69 and v = Y — 112. Then у=. The calculations needed to find r are given in the 
the next page: 


» . Xuv-(XuYXv)/n 
| уц? Ow | э e. 


n 


Ke 
2 . 2 
1530-48)" | 3455. (109 SS 
8 8 е 
| qu 
à 2160— 648 ) заў x 
10530 -288)х(3468-1458) % 
Hence the correlation coefficient beween à Y is 0.96. 


Example 10.7 If b; is the ge coefficient of X, on X; then calculate the product 
coefficient of correlation in each c. 3e given 


9 bu--0lba-- 59 ii) 53 = 0.27, bs, = 0.6 
iii) 5-0.67, b; 38. 
The product moment coefficient of correlation between X; and Х is given by 


„= үр xb 
i) Here Ё: =-0.1, and bz --0.4 


55 = —/(-0.1)(-0.41 = —0.20. 


r is negative since both regression coefficients are negative. 


ii) Неге both regression coefficients are positive, so r is positive. Thus 


hy = +yby xb, = e (0.27)0.6) = +0.40. 
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iii) Неге we have 


- 4(0.67(0.38) = 0.50 (** b23 and b» are positive) 


10.5.4 Correlation Co-efficient for Grouped Data. In a simple frequency table, the data are 

d with respect to one variable only. If the arrangement is made according to two variables 
taneously iri say, m columns and k rows, the frequency table thus obtained is called a correlation 
or a bivariate frequency table. The number of observations falling in the (i, /)// cell, is called the 
cell frequency and is denoted by fj. The correlation co-efficient, if it exists, can be calculated from 
a two-way frequency table Ьу: using the class midpoints as the value of the observations. The 
for r then becomes ў 


Ef,X Y, х f ,KXE fX) 


3 


кл, LES, x) [nn red " 


m El 

j чы 
p= г > the frequency of Y values, f., = > $ ency of X valueá and л is the total 
i ij ) ij 


ko ww 


Example 10.8 Calculate the co-efficient of, {Ом correlation from the table given bélow: 


Grades in 3" Mathematics (X) 


eie Total 
EA тет 


4 10 
1 6 16 
5 8 24 
9 2 21 
6 17 


ЕСИЕЛЕЛЕЛЕЛЕЛЕІЛЕІ 


(P.U., B.AJB.Sc. 1968) 


-645. |. Y-745 


Let us introduce two new variables и and v given by the relations и = x and y= T 


calculations needed for finding r are arranged in the table on page (412). 
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-4 
IPTE 
Бата беш 


5ЖЕЛЕЛЕЛЕГІ 


ET 
"xi > 
p 8 ч n 
Лим су 39 | 125 | «4 —— T 


The number in the corner of eac presents the product //шуу,, where fy is the cell 
Thus Д4 14v, = 2(1) (2)=4 and uN (2) = 16 and so on. The totals in the last column and 
row аге equal and represent У, fjujv, 


mes 20 (Л) fv) 
HEISE fy? ПХ fr? -( %)?] 


(subscripts dropped for convenience in 
EI (100)(125) - (64-55) 
000236) - (64* 101000253) - (—55)] 
16020 


-—R— - 0.77 " 
(1950422275) 


Example 10.9 (a) Correlation between X and Y is r, show that correlation between aX and 6 
or -r according as a and b have the same or different signs. 


b) Find correlation between X and Y connected by 5 
aX+bY+c=0.. (P.U., B.A. (Hons) Part-II, 


ањ 
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a) Let и-аХ so that й-аХ, 
and у= БУ so that y - bY 
Then (u-i ) = a(X- X ) and (v-V ) = b(Y-Y ) 
By definition, we have ° 
wes Х(и-и)у-У) 
ШЕРСЕСЕӘТЕ 


.abY(X -XXY -Y) 


- Ja! У(Х - Xy p? Xp 


ab 
Py 
Jab? 
= +r, if a and b are of the same signs. 
7-r, if a and b are of the different signs. S 
5) Weare givenaX + bY +c=0 қ” 


Thus a> X*b Y, Ү+пс = 0, whete n is the number of pai@ of values (X;, Y) 


ww 
aX +bY +с=0, X and Y being the meang х and Y sets of observations. Subtracting, we 
have x $ 


Dividing by л, we get 


a(X- X ) - (Y-Y) 0 o" 
= Б ә 
ог WM ant NS. 


WE te ND 
NOW Гүү = = 
хау) У-у) 


a Y 
-3ЖА-4) = 


2 = т: 2 
ІЛЕЛГТЕЙ = 


7-1, if a and b are of the same signs. 


= +1, if a and b are of the opposite signs. 


CORRELATION 


imes, the actual measurements or counts of individuals or objects are either not available ог. 
assessment is not possible. They are then arranged in order according to some characteristic of 
Such an ordered arrangement is called a ranking and the order given to an individual or object is 
rank. The correlation between two such sets of rankings is known as Rank Correlation. 
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10.6.1 Derivation of Rank Correlation. Let а set of n objects be ranked with respect to 
A'85 Xi, X2 sees Xp +24 Xn, and according to character В as yi, y», ..., Vis -.., Ул. We-assume that 


miore objects are given the same ranks (i.e. are tied). Then obviously X; and y, аге some two m 
l to n. 


Since both x, and y; are the first n natural numbers, therefore, we have 


Ўх= y= diel 1+2+3+..;+п etd 
іг! а, 


+ 


Ўх=Ўу = =? = +220324. еді Е адс Д 


ігі 11 ігі 6 
Ха--Хо-» =$- Qu 
ізі 


_ щ(л+1) (2n+1) Kcd _ n(n? -1) -1) 


6 4 12 
Let d, denote the difference in ranks assigned to the ith individual or object, i.e. d, = x; - Ж. 
к o 
Then Уа? -YG- x» Кы 
ізі ігі 


= (хур -2xy)- rad -2Xxy, 
Substituting for zx; and Уу, we get SS 


ўма? MEEN DCN TONS 


ігі к 9) 
» —— ыр 


The product moment co-efficient of correlation between the two sets of rankings is 
ENE» 


У хус 


п(п+1)(2п+1) 1 2| п(п+1) 
СЕРЕ sett 


„= : 
= n(n? -1) 
12 
щп+1)(2п+1) n(n*l)| | 2 
| 6 4 | 2-4 
n(n? -1) 
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nmm -1 1-2 à 
еп Be ame os 
n(n? -1) n(n? -1) 


12 
This formula is: usually denoted Бу ғ; in order to have a distinction. It is often called Spearman's 


t of rank correlation, in honour of the psychometrician Charles Edward Spearman 
1945), who first developed the procedure in 1904. 


is to be noted that Xd, has the least value and is zero when the numbers аге in complete 


When they are in complete disagreement, Ха)? attains the maximum value and is equal to 


ituting these values in the formula, we see that 


r,=1 for Xd, -0,and 
: Ei $ 
r,=-1 for d'= Мәзид) RS 
ғ, also lies between -1 and +1. © 


іріе 10.10 Find the co-efficient of rank corrido from the following rankings of 10 
m Statistics and Mathematics. 


Statistics (x): 1 2 
Mathematics (y): 2 4 


7 87229 10 
8 10 6 9 
(P.U., В.А. (Hons.) Part-I, 1964) 


% 


о ос іс t Бо м 
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Hence, using Spearman’s co-efficient of rank correlation, we get 


2 
r, „б бш, OR os 0-08. 
n(n? -1) 10х99 


This indicates a high correlation between Statistics and Mathematics. 
10.6.2; Rank Correlation for Tied Ranks. The Spearman's co-efficient of rank c 
applies only when no ties are present. In case there are ties in ranks, the ranks are adjusted by 


the mean of the ranks which the tied objects or observations would have if they were о 
example, if two objects or observations are tied for fourth and fifth, they are both given the mean 


тл? 


4 and 5, i.e. 4.5. The sum of adjusted ranks remains ші but X(x, -x)! *X(y; - y) * 
has been shown that each set of ties involving t observations reduces the values of d by a 
to ze Ока i esiste ую пейш и Кран: 


First, for éach tie, add a quantity se =t) to Ха? substituting the values 
Spearman’s co-efficient of rank correlation in е to adjust the gut for the tied observations. 


Second, use the product moment co-efficient of corr їп to find the correlation between 
sets of adjusted ranks. 


Example 10.11 Two members of а eiecit rank eight persons 
‚ Suitability for promotion as follows: м, 


We observe that both 
therefore calculated as pelo %% 


т; of rankings contain ties. The coefficient of rank с 


For tie between В and C, (first rankings) /-2 and for E, F and С (second rankings) =. 
the quantity to be added to Y; d? is 


‚ https://stat9943.blogspot.com 


LE REGRESSION AND CORRELATION 417 


re tees 
(2? 9) 4G sha 25, 
12^ ) 12% ) 


85425 66 
Hence ғ, Ee 1-55 21-0131= 0869. 


ative Method: - 


We see that the first member has tied В and C, while the second member has tied E, F and б. Let 
te the ranks given by the first member by х; and those of second member by у. Then we proceed 


w: 


уг -(Ly,)’ іт) 
= 
198.5 - (36\(36/8) 
2055- Qi 1(202 — (36)? /8] 
.. . 1989-19 б — 365 ; 
4(203.5-162)(202-162) ((41.5)(40) Š 


- 365: = 0.896, 
47 


icates a high degree of agreement between the two members. 


Co-efficient of Concordance. The Spearman's co-efficient of rank correlation measures 

nt between two sets of rankings only, but in practice; the individuals or objects are sometimes 

more than two people. We then need a co-efficient to measure ж-ы. among more than 
rankings. Such a co-efficient is obtained as below: 


there be m rankings of п individuals or objects instead of two. Obviously in case of complete 
the rank totals will form the series m, 2m, 3m,..., nm 


mean of these totals is 


X =(т+2т+3т+...+пт) +n. 


_ т(1+2+3+..+п)  m(n-l) 
п 2.8 


and the variance of these sums, which is the maximum possible, is 
2 2 
Var(Total) = Цит? + (2m)? +(3т)? em [22 
n 


тр +2243? en] (те 
2 


_m (n+l)(2n+1) т.п) m(n? zi 


6 4 12 

But the totals of observed ranks will not necessarily be the samne/ Le S denote the 

т(п +1) 

2 

Co-efficient of Concordance) W, is defined as the ratio of the vari of the totals of the о 
to the variance in case of complete agreement. Thus, we һауе X ° 


n 


squares of deviations of the totals of the observed ranks from their common mean, .1.е. 


This co-efficient is due to Maurice С. y (1907-1983) and varies s from 0 to 1. Whee 
represents complete agreement. 


д ^ Example 10.12 The following 4а жҮуе rankings of six persons for their ability by three 
and А. Calculate the co-efficient of 2% 


(P-U., В.А. (Hons.), P 
Неге the totals of the observed ranks are 9, 5, 14, 12, 10 and 13; m=3 and n=6 so 


_ т(п+1) 364). 
Thus S= (9-10.5)=+(5-10.5)=(14-10.5)+(12-10.5)+(10-10.5)=+(13-10.5)} 
-(-1.5):Қ-5.5)-Ң3.5):Ң 1.5) H-0.5)+2.5} = 53.50 


Hence АТК NP кже: P TA 


m^(m -nm) 9(216-6) 1890. 
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EXERCISES 


CTIVE 4 


Answer ‘True’ or ‘False’. If the statement is not true then replace the underlined words with words 
that make the statement true: 


I A high value of correlation between Y and X indicates a high likelihood of a cause and effect 
relationship between Y and X. 4 


Correlation analysis finds the equation of the line for two variables. 

The co-efficient of correlation lies between 0 and +1. 

The correlation co-efficient is not independent of the origin and scale. { 
Regression analysis measures the strength of the linear.relationship between two variables. 
In regression analysis X and Y must both be normally distributed. 

The method of least squares gives the line of best fit. S 


If the co-efficient of determination r is equal to 7, then Ж widicates that 50% of the variation 
is due to chance or other factors. S 


If the slope of the regression line has a СУА then the coefficient of determination 
also is negative. aS 


If all the points in a scatter diagram а the regression line, then the standard error of 
estimate equals positive value. әр 
K 


TIPLE CHOICE ооуу 


When the slope of re ine is negative, the following statistic is also negative 
a) r SS 

b г 

c) Standard error of estimate 


d) Standard error of slope co-efficient 


If there is no linear relationship between the two variables then which one of the following 
does not hold? 


а) а-0 
| b) b=0 
©) r=0 
d) The regression line is either vertical; or horizontal. 


‚ 420 


ш) 


іу) 


у) 


vi) 


vii) 


viii) 


.d) Itis the geometric mean between the two regression coefficients 
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If the correlation co-efficient r 0.7, then the proportion of variation for Y explained by 
a) 0.49 
b) 0.50 
c) 0.70 


d) 0.70 


The dependent variable is also known as 
a) Explained variable 

b) Response variable 

c) Predicted variable 

d) Allofabove 


In the regression equation Y -а + fx +=, both X and Y variables are 


a) Random Se 
b) Fixed © 

c) X is fixed and Y is random eU 

d) Y is fixed and X is random PM 


ай ҚМЖ? 
The variation of the Y values around thePeression line is measured by 
а) X(r-7) №“ 
ы EY- i 
) “0-77 > 
d) None ofabove e 
If bóth the t and independent variables increase simultaneously, the 
coefficient wilPbe in the range of 
a) 0Оїо+1 
b) 0to-l 
с) 1to2 
d) -lto+l 


Which of the following statements is incorrect about correlation coefficient? 
a) It passes through the means of the data | 

b) Itis symmetrical with respect to X and Y 

c) Itis independent of origin and scale 


https://stat9943.blogspot.com 


LE REGRESSION AND CORRELATION n 


ix) If the unexplained variation between variables X and Y is 0.40 then r° is 
a) 0.75 
b) 0.60 
с) 0.40 
d) None ofthe above 


x) The strength of a linear relationship between two variables Y and X is measured by 
а) ; г 

b) Ы, 

с) г 

4) Мопе оҒаһоуе 


СТІУЕ 


Explain what is meant Бу (i) regression, (ii) regressand, (iii) regressor, and (iv) regression 
co-efficient. 3 5 


Differentiate between а deterministic and a probabilistic aship, giving examples. 


What is a scatter diagram? Describe its role in the a regression. 
o. 


What is a linear regression model? Explain ns underlying the linear regression 
model. ^ AS 
by 


Explain the principle of least-squares. 4» . 


Explain briefly how the principle of, squares is used to find a regression line based on a 
sample of size п. Illustrate on a sketch the distances whose squares are minimized, 
taking care to distinguish the ф lent and independent variables. 


Find least-squares збіта) of parameters іп а simple linear regression model 
Y,=a+BX,+e,, whe $ are distributed independently with mean zero and constant 
variance. X 


What are the Бс АСА of the least-squares regression line? 
(P.U., В.А./В.ӛс. 1992) 


Show that the regression line passes through the means of observations. 
(P.U., D.St..1962) 


Describe briefly how you would obtain the line of regression of one variable ( Y) on another 
variable (X), using the method of least-squares. 
(P.U., В.А./В.8с. 1975) 


What is meant by the standard error of estimate? If the regression line of Y оп X is given by 
Y = a * bX, prove that the standard errot of estimate 5,., is given by 


nna [Er -agr-szxv 
vx n-2 " 


422 
10.6 


10.7 


10.8 


10.9 


10.10 Given the following sets of values: 
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Given the following set of values: 


reeds 15010" 175-19 
(ie ЕТО 177 8:56:59 


a) Determine the equation of the least squares regression line. 
b) Find the predicted values of Y for X 710, 11, 15, 17, 19, 20. 


c) Use the predicted values found in (b) to find the standard error of estimate. 


Given these ten pairs of (X, Y) values: 


Be eae 5n we C ES mes AMI, 


219 72:5/75:19530:23:9: 032: -43: 539/454. 48 


a) РІога scatter diagram for the above data. 


b) Carry out the necessary computations to obtain the least-squares estimates of the 
in the simple linear regression Y, = œ € BX,  e;. 


c) Compute the residuals and verify that they add to zero. 

d) Use the regression equation to predict the values pé Ra Х=10. 

For each of the following data, determine the езїтаїе@Уёргезз1оп equation Y =a+bXx, 
à =10; Y = 20; Z XY =1,000; E X? = 2466-10. 

b) EX =528 EY 211720, EXY =\ EX? = 11,440; п 232. и 

с)  EX-1239 XY = 79; X XY 3; EX? =17,322; LY? = 293; п=100. 

d) n=10, ХХ =1710, XY QP EX? = 293,162, ХУ? = 59,390, У XY = 130,628. 
e)  X =52,¥ =237, Bai) = 2800, (Х-ХХУ-Ү)-9871. 


The owner of a retailj iQ eanization is interested in the relationship between price at 
commodity is обем or sale and the quantity sold. The following sample data have 
collected. 


|Pice || 25 45 30 50 35 40 65 75 70 60 


Quantitysold. | 118 105 112 100 111 108 95 88 91 96 


a)  Plota scatter diagram for the above data. 


b) Using the method of least squares, determine the equation for the estimated 
Plot this line on the scatter diagram. 


c) Calculate the standard deviation of regression, s, ,. (B.Z.U., М.А. Econ. 


NEC T Deere Oa ep REI 
3227.45. 10. 2:0) ^15-'06,..29 
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a) 


b) 


5) 
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Compute the least-squares regression equation for Y values on X values, that is the equation 
¥=a+bX. 
Compute the standard error of estimate, Syr 
Compute the least-squares regression equation for JY values оп Y values, that is the equation 
X =aytho Y. 
Compute the standard error of estimate, 5,;. 
Explain what is meant by the co-efficient of determination. 
Compute the co-efficient of determination for the following data and interpret the 
co-efficient. 

Income (X) (000) 10 20 30 40 50 60 

Ее 22709920234 6:53 


Expenditure (Y) (000) 


What is the total variation, the explained variation and the unexplained variation? 


Compute (i) the total variation, (ii) the explained variation ап) the unexplained variation 
for the data in 10.11(b). How much of the variability Ү is explained by the linear 


regression тюй” y e 
Differentiate between regression and correlation, da examples 


(P.U., B.A/B.Sc. 1979) 
Describe the properties of the correlation сооп 5 


What values тау r assume? Interpret еміп when r — -1, 0, +1. 
(P.U., B.AJB.Sc. 1980) 


Define the terms correlation ап duct moment co-efficient of correlation. Prove that the 
correlation co-efficient is ind епі of the origin and scale. (P.U.. В.А./В.5с. 1981) 


Compute the m between the variables X and Y represented in the 
following table: X) 


Multiply each X value by 2 and add 6. Multiply each Y value by 3 and subtract 15. Find the 
correlation co-efficient between the two new sets of values, explaining why you do or do not 
obtain the same result as in part (b). 

Show that, if гүү is the correlation co-efficient calculated from a set of paired data (Xi, Yi), 
(Xa, Yo); ..., (Xm Yn)» then rav, the correlation co-efficient for u; = aX;+b and у; = cY;*d 
(with a + 0 and c = 0), is given Бу пху. 

Calculate the correlation co-efficient by first multiplying each X and Y by 10 and then 
subtracting 70 from each X and 60 from each Y. 


$2 96 70 94 109 71 90 66 84 105 
87 96 69 85 113 76 92 63 84 124 
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10.16 а) ^ Explain the term correlation. It is known that гуу =0.7. Find (i) лу, (ii) ru where и = 
and у-3/. : 


b) -Calculate the coefficient of correlation for a sample of 20-pairs of observations, given 


-Х-2,Ү- 8 EX? =180, LY? =1424 and УХЕ 404. ДӘР; чш: 


424 


10.17, The following data were RAM from personnel records of a manufacturing firm: 
-X = number of years of service, Y = weekly wage rate 
n 723; Y, X=2,433; X. Y-4/245; У, X°=281,019; У ү2=841,786 and © XY-482,788. 


i) Compute the correlation co-efficient. 


ii) - If the correlation co-efficient indicates that there does exist a relationship between А 
compute the least-squares line of regression, What do the values of a and b signify? 
.(P.U., B.AJB.Sc. 


10.18 Find the product moment co-efficient of correlation between c density and accident 
the following information available. Find also the coefficien letermination and interpret ï 


10.19 Given marks as 


Т к oe 
Economics paper | 36 "41 46 59 46 65 31 68 41 70 36 
Physics Paper 62\М2 60 .53 36 50 42 66 44 58 65 71 


Find the co-efficient ofSRelation and interpret it. 


10.20 Calculate the co-efficient of correlation and obtain the lines of regression of the following 


10:21 a) Find the correlation co-efficient between X and Y, given 
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b) Find the co-efficient of correlation between persons employed and cloth manufactured іп a 
textile mill. Interpret the result 


Persons employed 137 209. 113. 189 176 200 219 
Cloth manufactured (000 yds) | 23 47. 22 40. 39 51 49 


(P.U.. В.А./В.ӛс. 1960) 
22 The following table gives the distribution of the total population and those who аге wholly or 
partially blind among them. Find out if there is any relation between age and blindness. 


RZ (P.U., В.А ./В.5с. 1983) 
Hint. First calcuiate the numbers of blink and then correlate with the midpoints of age 
groups. 9 r 


A computer while calculating the corre; co-efficient between two variables X and Y from 25 
pairs of observations obtained the fo sums: 


OXX-025 ENT 650, EY =100, ЕУ: =460, [XY =508 - 
It was, however, later di Be at the time of checking that he had copied down two pairs as 


“Steg NY red 
6 | 14 while the correct values were 8 | 12. Obtain the correct value of the co-efficient of 
816 618 


correlation. . (cs. 1972; P.U., В.А./В.ӛс. 1974) 
If the equations of the least squares regression lines are: 

a3) . Y-20.8-0.219X (Y on X), and X = 16.2-0.785Y (X or Y); 

b)  Y-2.64*0.648X (Y on X), and X = -1.91+0.917Y (X or Y); 

c) Ү-1.94Х%10.83 (Y on X), and X = 0.15 Y+6.18 (X or Y); 

2)  Y-15-1.96X (Y on X), and Y = 15.91-2.22X (X or Y); 


Find the product moment coefficient of correlation in each case. 
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10.25 Find the co-efficient of correlation for the frequency distribution of two variables given 


0) judi following Тт 


Феде Ae Also find the regression equation of Y and X. 
«ec 


26 Compute correlation co-efficient from the following солуни table for weights and 
women students. 


ы 


10.272) Describe in simple words th owing concepts: 


(i) co-efficient of canon (ii) scatter diagram, (iii) least squares ргіпсӣ 
(iv) estimate of regression co-efficient. 
b) Тһе following resi were obtained for a bivariate frequency distribution after 
-125 500 А 
x = 2a sn = 66,5 fa =A E fi? = 109, 8 


transformation = and у= 


Xf? =115, X fuv = 91. Calculate the coefficient of correlation and obtain the eq 
the lines of regression in the simplest form. (P.U., B.AJB.Sc- 


10.28 If Х\, X; and X; are uncorrelated variables, each having the same standard deviation, 
co-efficient of correlation between (Хү-Х;) and (X5-X3). (P.U., B.A. Hons. Part-Il, 


10.29 a) What is rank correlation? Derive Spearman's co-efficient of rank correlation. 
(P.U., B.A./B.Sc. 1960, 71, 82, 


b) The ranks of the same 16 students in Mathematics and Physics were as follows: 


E (1, 1); (2, 10); (3,3); (4. 4); (5, 5); (6, 7); (7, 2); (8, 6), (9, 8); (10, 11); (11, 15); (12, 
14); (14, 12); (15, 16); (16, 13); the two numbers within brackets denoting the ranks 
same student in Maths, and Physics respectively. Calculate the rank correlation ci 
for proficiencies of this group in two subjects. < (P.U., В.А./В.ӛс. 
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a) 


If n pairs of values of two variables a and b are given, where each variable is ranked in order 
(1 to л), show that the co-efficient of correlation between ranks is given by 


6xd' 
UE ee 
n(n* -1) 
where d is the difference between the ranks of a and b. М (P.U., В.А./В.5с. 1989) 
b) ` Obtain the product moment coefficient of correlation between the following values: 
74 90 110 25 4.6 6.5 


| b |85 61 24 67 126 33 


Rank the values and hence find a rank correlation coefficient between the two sets. 


Describe circumstances in which you would use: (i) rank correlation co-efficient; (ii) product 
moment correlation coefficient. 


b). The following table shows how 10 students, arranged in alphabetical order, were ranked 
according to their achievements in both laboratory and lecture portions of a statistics course. 
Find the co-efficient of rank correlation. 


%. 


а Ав 5с. 1969) 


ES 


Ten competitors in a beauty contest are A > Ес S a es in the Res 


First диш 


Use the rank correlation e ht to discuss which pair of judges have us nearest approach to 
common tastes in beauty. (P.U., B.A./B.Sc., 1960, B.Sc. (Hons.) Part-I, 1971) 


In a painting competity SS ous entries are ranked by three judges..Use Spearman’s rank 
correlation co-efficient to discuss which pair of judges has the nearest approach to common tastes. 


(P.U., D.St., 1964) 


2) Whatare tied ranks? Explain how you would find the co-efficient of rank correlation for tied 
ranks. 


b) Compute the co-efficient of rank correlation for the following ranks; 


ав — hítps//stat9943.blgg&peEGGFTo sransricar meor 


10.35 Establish the formula for the ‘co-efficient of concordance’. Find the same for the following data: 
Xd: 9 ENR RA OSS. 556 кыў «Dg NOT TG 


9999%9%9%%9» 
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INTRODUCTION 


The technique of simple regression which involves one dependent variable and one independent 
is often inadequate in most real-world situations where a variable depends upon two or more 
ent variables or regressors. For example, the yield of a crop depends upon the fertility of the 

fertilizer applied, rainfall, quality of seed, etc. Likewise, the systolic blood pressure of a person . 

upon one's weight, age, etc. In such cases, the technique of simple regression may be expanded 

several independent variables(A regression which involves two or more independent variables 

a multiple regression. Thus, in case of multiple linear 


regression where k independent variables influence the dependent variable Y, the general format of 
lis 


Y, =a+B,X,, * B4X4 +..+B,X, €, (151, 2, ..., л) 
=, 's are the random errors, 
а and D,'s áre the unknown population parameters, œ is the intercept and B,, z, ..., В, are the 


regression coefficients for variables X1, X», ..., X, respectively, о ; 


Хи, Хз, ..., Aig are the fixed values of k independent variableg Sthe first of the two subscripts 
attached to each regressor denotes the variable and the second to the observation number, 


We assume that $5 
ETUR А ~ A 
9 E(e;)=0 for alli. This implies that for given уаш of Х/ 5, 
E(Y)- арх, *pX +--+ BX u: 
4) Var(e,)=E(e, 2) = c? for all i, i.e. thy vance of error terms is constant. 
)  £(st,,£,)70foralli * j, ie. е! queis are independent of each other. 
| Е(Х,є,)= 0 for all regresors f є and each X variable are independent. 
*  £'s are normally distri with a mean of zero and a constant variance с>. 


9»  Weassume further in вые regression model that there exists no exact linear relationship 
between any two of the regressors. 


The corresponding regression equation estimated from sample data then takes the following form 
Y, ash Xy tb Xy ob X, 


a and b;'s are the least-squares estimates of the population parameters о and |, "5. The parameters 


ж their estimates bs are called the partial regressio -efficients as D, ог its estimate 


.., k) measures the change in the mean value of Y for a unit change іп X, while all other 
les remain unchanged. yi 


MULTIPLE LINEAR REGRESSION WITH TWO REGRESSORS 


For two independent variables X, and X;, the predicting equation for an individual Y value is 
Y, za +B, X, +В. +, 


t. 


Ф дәт 
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and the estimated multiple linear regression based on sample data is 
Y -a*b X, %Ь,Х, 


for a set of n observations, each of which is a nambet triple (Xj, Ха, Yı). The error or residual in 
is given as + 


"e; -Қ-% =Y, -= (a +b Xu +b, Xu) 


Using the least-squares criterion, we determine those values of a, b, and b; which will minimize 


2 2 2 
of squared residual, Xe?. To minimize Le? , we find ES x and E and set 
1 2 
Thus 
Ер 
PZA зуу - ae by + ДЫ 
=e) ; 
zs = 2X Ху - (a bj Xy, +,Х,))=0, 
1 
= ғ z 22.59 
М 8x6. зухду ань 
2 N) 4 ox EN 


tru dd ce 
Simplifying, we get the following three normal vd 
LY =na+b, LX, +b, М 
EXyY-aXX,*b LAW, LXX. 
ZX,Y-aXX,-*b, б +b, LX} 


The values of а, b, and bz жӨМегеттілей by solving these three normal equations si 
and are substituted into o = 


^ BN 
Y-a 556 +b,X; 


to obtain the estimate@Qmultiple linear egression equation. 


Example 11.1 A statistician wants to predict the incomes of restaurants, using two 
variables: the number of restaurant employees and restaurant floor area. He collected the fo. 


Floor area Number of 
(000 sq. ft) employees 


LE кЕСВЕЗО 10551849943 .blogspot.com 


Calculate the estimated multiple linear regression equation (i.e. У=а+ b, X, +b,X ,) for the above 


The estimated multiple linear regression equation is 
Y =a+b X, +b;X, 
a, b, and b; are the least squares estimates of а, В, and В, . The three normal equations аге: 
LY =na+b, EX +, UX, 
UX Y =a} X, +b, EX +b, EXX; 
EX, Y =aLX, +b, EX, X, +b, EX}. 


The calculations needed to find a, сз b; are showing in the following table: 


5 
ds 
100 
9 
4 


asp al | ae) af | oi 


Substituting the sums in the normal КЕС; - 
5а * 30b, + 52Ь› = 89 
30а “2385, + 351b; д> 
52а + 351b, + 5826, Bis 
Solving them simultaneo! зық 
a=-1.33, bi bal and b, = 1.62. 


Hence the desired estimated multiple linear regression is 


^ 


Y = -1.334 038X/-1.62X ,. 
11.2.1 Expression of Multiple Linear Regression in Deviation Form, The computational 


e is considerably simplified by working with the deviations from the respective means of the 
les. With two independent variables, the estimated multiple regression equation is 


Ý, =a+bX q +b, Xu @=1,2,...п) 
As the regression equation goes through the point of means, we һауе 


Y-a*b Xi b, Xa. 
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Subtracting, we get 

y, =X, +b, Xz, 

A А - T 
where у= Y,- Y, х= Хү- X1 and xy = Xy - X3. 
Then e, 2 y, - y, = y, - bx, 7 Бх, and 

Le? = (у, ху, -bxyy. 


Differentiating Xe? partially w.r.t Б, and bs, and equating to zero, we get 


< 2 
one z-2XxyQ,-bx,-5x,) 20 
ôb, 
де? 
uh 2-223467 -bX х) =0 S 


which yield, on simplification, the following two normal е Ло: 
Ixy =b Dx) +b Exx, SS 
Eny =b Dax, +0, Ух}, LO 

where the subscript i is dropped for сопует in printing. 


Solving these two equations sim usly, we get 
в. 
KS (>х,уХ Ne Xxx) and : - 
EM ee зан - 
Р b ал-ло d 
2008 VOY xg) = Qux) 
Then a, the constant, is determined by 

а-Ү- b Xith Xa. 

This is an alternative approach to solving the normal equations directly. 


Example 11.2 Compute the estimated multiple linear regression Уса +X, +b X: 
in Example 11.1, using the multiple regression in the deviation form. 
In Example 11.1, we found that 
ЖҮ =89, DX, =30, DX, =52, DX? = 238, DXF = 582, 
EX, X, =351, Х,У = 619, Z Х,У =1007 and п=5. 
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Now we first calculate 


2 2 
Lx? е? - ЖАУ... sg. OD «5, 


4 


2 2 
зж - Е =582-025 
п 


=412, 

Ухх = ХХХ, -Qai 351- 8062, 39, 
хлус 009) _ 95, 
Inys Ху 202000 =1007- PE „з, 


Next, we compute the regression co-efficients and constant as follows: 
Ne (5 ху х2) - (5 х,у) xx) e 
ШЫЛЫ © 
(85)(41.2) -(81.4(39) 3274 | ом 
жетше ыш у CSS 38, S 
(58)(41.2) - (39) 868.6 $ 
p Са) халы) «УР 
Са) Сл)" wot 
.(814)58)-(85)89) _1 


- 1.62, 
(58)(41.2) - (39)* Kå 6 


and azY-bXi*b, Xa N 
S 
-178- (оз (1.62) (10.4) --1.33 
Hence the desired multiple Wear regression equation is 


Y = -1332 0.38X, +1.62Х,. 


It is to be noted that we have exactly the same results as previously. 


11.22 Standard Error of Estimate. The standard error of estimate is the standard deviation of 
regression. It measures the dispersion of Y values about the population multiple regression 
For a multiple regression with two independent variables X, and X», it is denoted symbolically 

. where the subscripts indicate that Y is regressed against two independent variables X, and Аз. 


y, the value of ©, |, is not known, it is therefore estimated from sample data. 


le standard error of estimate (unbiased estimate), denoted by зу |; is given by 


(у-ү)? 
$y12 7 "NONE 
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but it can be computed more readily by using the following relations: =- ~ 
L(Y - Y? = XY - (a X, b, X.) 
2LY'-aXY-bXXY-b,YX,Y. 
A larger value of уу; means that the multiple regression equation is of little use in esti 
prediction. 2 
11.2.3 Co-efficient of Multiple Determination and Multiple Correlation. The co-efficé 
multiple determination, which measures as in the case of simple regression, the proportion of varia 
the values of the dependent variable Y explained by its linear relation with the independent vari 


defined by the ratio of the variation in Y explained by the regression equation to the total variation. 
multiple regression with two regressors X, and X»,*the co-efficient of multiple determination is 


symbolically by В? |, and is computed by 


; (Ў)? 
Ү.12 E -Y? 1 


where y =a+b,X, * b, X, , but it сап be readily computed by шіл relation 


Ef -Yy -aXY +b X XY «b; хл їп. 
Тһе co-efficient of multiple determination lies be 0 and 1, and has same meaning as in 
linear regression. р М 
sS a 
The positive square root of the co-efficféqe Of multiple determination, ie. Rẹ} is 


co-efficient of multiple correlation. Ry, asures the degree of association between Y and 
regressors X; and X; combined, and is a s taken to be positive. 


Example 11.3 Compute th ҚұФадага error of estimate, co-efficient of multiple determi 
coefficient of multiple correlates the data in Example 11.1. 


For the data in Examg 1, we found from the regression calculation, that 
Ir-r? =1885, n = 5, a = -1.33. 
У X,Y = 619, X Х,У =1007, b, = 0.38, b, -1.62 


XY -aLY-b, ZX, Y-b, У X,Y 
n-3 


Ж [1885 - с .33)(89) - (0.38)(619) – (1.62)(1007) 
5-3 


а Mi = 68405 =8.27 


which is the standard deviation of the multiple regression. 


Therefore Зур = 
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The coefficient of multiple determination is 


,aXY*b EX V+, YAX,Y-(XY) /п 
LY? -(XY)/n 


_ (-1.33)(89) + (0.38)(619) + (1.62)(1007) — (89): /5 
1885 – (89)? /5. 


This means that 55% of the variability in income is explained by its linear relationship with floor 
and the number of employees. 

The co-efficient of multiple correlation, Ry; is S 

В, = 0.55 = 074. M 


11.24 Subscript Notation. For the purposes of ge ration. and change of variables, it is 
ient to adopt a notation due to C. Udny Yule (187 ). This notation involves subscripts. For 
le, the individual Y value in case of the multiple li gression with two independent variables, is 
as 


нарны сў) 
Using Yule's notation, this can be wri s 
Xy 72 Bias Ва Хи EO ЖЕ, 


the variables are numbered Rad 3 by the use of subscripts. The subscripted number 1 denotes 
dent variable, 2 and 3 Mote the independent variables X; and X; respectively, and B, 5; is the 
of X, when X; and X; are both equal to zero. 

There are three subscripts attached to each parameter. The subscripts preceding the point are called 
subscripts and those following the point are known as secondary subscripts. Тһе dependent 
is always indicated by the first primary subscripts, while the second primary subscript indicates 
iable to which the B co-efficient is attached. The secondary subscript(s) indicates which other 
s) has been included in the regression equation. The secondary subscript, if more than one, may 
іп any order. 


The advantage of this notation is that it indicates the number of variables involved in the regression 
son and also shows which is the dependent variable and which аге the independent variables. 


The estimated multiple regression equation of Х on X; and X; is 


Xi =b n %5,,Х,%б,,Х,. 
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It should be noted that in general Р is different from Pis. 
Allowing a change of variables, the estimated regression equation of Х; оп X; and X; is given 
5; тӛ + Ху tb 3%). 
* Similarly, the estimated regression édoation of X; on X, and X; is 
Хуа bi 4, X Eb, 
If the two variables, measured from their means, be x, and x; then the two simple 
equations of x, on x; and of x; and x, are 
xi = bix and x; = baxi 
The residuals may be expressed as 
хуз = Xi — Бухг and x21 = x2— Вуху. 


If хі, x; and x; are three variables, measured from their respective means, then the 
regression equation of x, on x; and x; is 


X = biz% + bigas $e 
and its residual is expressed by e 2 
Xiz * Xi — Воз baza о * қ 


The two normal equations may be written as 
Ухх 20 and Ухх, -0. 45% 


11.2.5 Properties of Residuals. The resi or errors have the following properties: 
1. “The sum of the products of eres values of a variable and a residual is zero, 
the subscript of the variable is i ed among the secondary subscripts of the residual” 
Let the regression equation (i vation form) of x, oma; and x; be 
xı = Вуз Х; + by AN 
Then the two normal ms for determining the `$ are 
Exxa W= Lay a, 
where x} 25 = x — biz 3x1 - bias. 
Similarly, the normal equations for the regression of x; on x, and x; and of x; on x, and x, 
Ххх =0= ХХХ) 
Ххх =0= хоху. 
2. Тһе sum of the products (ог covariance) of two residuals remains unchanged by 
one residual any or all of secondary subscripts which are common to both". 
Let the residual defined as x; 2 = x, — Бух be considered. : 


Then Ухх = Ex sx, bat). 


= Exa -520Xx305 
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The second term vanishes as Ў x,x,,, -0 


Then Ххх = Xxx 3, 
Again 2x, x4 = Xx s (xi Ваз, 7552x,) 
= хх» -баз ХХХ 755; 2X3 у 
Here again the second and third terms vanish due to their being normal equations. 
Hence Ххх = Ў хуу зз. 


3, “Тһе sum of the products (ог covariance) of two residuals is zero provided all the subscripts of 
one residual are included among the secondary subscripts of the second." 


Let us consider the residuals defined by x5 and х. зз. 

Then 2х,,Х = УХ ( bast 2х3) 

But this vanishes because of normal equation and property 1. 
Similarly, Xx, 5,55 «0. S 


11.2.0 Multiple Regression in terms of Linear Соггаа ба Coefficients. Тһе multiple 
ion equation of a variable, say Xj, on other variables, say «а Ху, сап be sometimes expressed in 
of riz, 7,5 and rz, the linear correlation coefficients. le regression equation (in deviation 
of x, on x; and x; i$ given by AS 


ху = bizsxa + ТҮ - 2 
The two normal equations are obtained as ор» 


2339 zb, Ex +03 xg 
2255 = Das Euh. 225 


Let 52 be the variance of ж let rj be the linear correlation co-efficient between x and ху. Then 
ing the normal equations W terms of variances and linear correlation co-efficient, we get 


nr,S,$; = nb, 3S; € nban S:S; 
nr,$,$, = nba 758,8, %лб,,52 
Simplification gives 

NS, = 635 + bis 1353, and 
13$; = 63735: +6253 


Solving these equations simultaneously for b's, we get 


2 


ы, = (атауы) una 
Sl tae. 
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S, tis — Nal 
bs; -à[ E gel 
3 “ly 


Substituting these values in the regression equation, we obtain 


Pies дагаа E 7i -Tan ЕЁ 
кор ЕНЕ ШІСІ time Je 


Or dividing both sides of the equation by 5), we get 


хь [л =й 22. 1| 55279. 133. 
5 1-2 5; 1-2 S, 
as the multiple regression of x, on x; and x, in terms of standard deviations and the linear c 


co-efficients of the variables involved. Similarly, the other two multiple regression equations of x 
and x; and оҒхҙ оп x, and x; are obtained as 


а „(а-аа а), ase 3 у 
ES 


5; l-r AS 1-7 xs 


М) 
з. a= Hels а). 753 7 3^3. | X 
55 | d-r AS 1-% 
To obtain the regression equations in teris G8 Signal values, we replace x, by X, -Xy 


X, =X. 2; and ху by Ху — X, respectively. D 


11.3 MULTIPLE CORRELATI dS EFFICIENT 
\Тһе co-efficient of multiple ton measures the degree of relationship between a vari 


its estimate from the regression ion.) In other words, it is a product moment correlation 
variable, say x, and its value ted by the regression equation x,=b)23 x? + bj3.2 хз. The co-e 
multiple correlation between x, and the variables x; and x; combined, is denoted symbolically by 


Let us denote the estimated value of x, by x. Then by definition, 


2L Соух) . x) TEE x, 
" Драка) Var(x,) Ex? X(x,)? 
Now Хх х = лү(хү 7-13) Ge x, =) 
2 Yx -Extia 
PAA A (> Жхх,» = хх) 


= п(52 — 5123), ' 
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52,5 is the sample variance of residuals. 
X(x,)* = Убх ху) 
= Lap + хз 2D 1% 23 
= Daf + Daly 2D xi ("Ухх Хз) 
= Ха? -Exin = (5 —$]›;), 
Ух? = п? 
tituting these values іп the formula, we get 
2 e Reg 23s 
Қат n -|-%%) 
552-652, 5; 
ing, we get R?,, = E 
ig, WE gi 123 5: E ж 
quantity 52, сап be expressed іп terms of the simple corretto co-efficients between the pairs of 
variables as below: S 
e 


1 1 Y 
Sin =- Эз - а =b% 


n 

1 I . 
2—Xx(x-552x -b by 

п S 

9 

1 S 1 
-« “Ex -5, Nx -b,—Xxx, 

n e. n 
* S? AES = 63255373 

Substituting the value Of Бізз and 5іҙ2 іп terms of simple correlati n co-efficient and simplifying, 
get 1 


(second property of residuals) 


2 2 2 
52, = 54 Шылысыс t Phas 
> 1-3 


Неке Ж2,-1-50-т -% amt 25753) 
Si (l-r) 


2—2 
na +3 – 2531 
= 12136122505 со that 


1-02 


=== —— { 
2 2 
: E: tni зь 
2 
1=7;, j 


oum Reus = + 
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It should be noted that A; 2; is necessarily positive or zero as the term У x, 3, being equal to X(£ 


cannot be negative. If Riz = 1, ће 52, = 0, ie. all the residuals хүр; are zero; the observed 
estimated values.of x, coincide. The multiple correlation in this case, is called perfect, indicating a Ii 
relationship between the variables. 2 


Similarly, by the change of variables, we get 


2 
r- rà 2r Par, d 
Re 23 21 12723713 , and K 


l- -rA 


2 
pp = [Bitra im ү ШЕ 
312 7 


2 
1-% 


| Е le 11.4] An instructor of mathematics wished to determine the relationship of 
final examination to grades on two quizzes given during the semester. Calling X,, X; and X; the 
a student on the first quiz, second quiz and final examination respectively, he made the fe 
computations for a total of 120 students. £ Кы 
X; -68-. Sym E0.. -гу=060 о 
X2=7.0 5-08 љ=00 ОМ 
X:-74 8-90 m=065 49 
а) Find the least-squares regression ао on X; and Хз. 
b) Estimate the final grades of two studa No scored respectively (1) 9 and 7, ала @) 4 
on the two quizzes. 


te Roi. 9 Sc. Ehg. 
с) Compute А; 1: 3 (B.Sc. Eng 
a) Since the standard деуі; ON and linear correlation co-efficients are given, thi 
estimated regression A of X; on X, and X; is 
5-0. ns cS XX zd аав 2) 
5, 1-73 5; 
Now т” 1% (0.60)(0.65) 0.31 add 
1-73 1-(0.60): 0.64” 
П"з-Маһз  0.65-(0.60(0.70) 0.23 
1-5 1- (0.60)? 0.64” 


Substituting these values, we get 
X,-74 СЕ ы x 
9.0 0.64 1.0 0.640) 08 
or X3 — 74 = 4,36 (X, — 6.8) + 4.04 (3377.0) 
= 4.36 Ху - 29.648 + 4.04 № -2828 
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X; = 16.07 + 4.36 X, + 4.04 X; 
the desired least squares regression equation of X; зоп Ху and X;. 
b) Student 1: When X, * 9 and X; = 7, we get 


Х, = 16.07 + 4.36 (9) + 4.04 (7) = 83.59 = 84 
Student 2: When X, = 4 and X; = 8, we get 


Y, = 16.07 + 4.36 (4) + 4.04 (8) = 65.83 = 66. 


с) The co-efficient of multiple correlation А; |; is 


№, = ІН su Dus Е 
12 
_ |(0.70)? +(0.65)? — 2(0.60)(0.65)(0.70) "e 
" 1- (0.60)? г is 
| 4, деу 


0.3665 Q 

ae 40.5727 = 0.757. S але he, hey 1 
аж ЖЕТЕ tf 4 

PARTIAL CORRELATION 1s “1 Y 


A partial correlation measures the degree of Ai relationship between any two variables in a 
variable problem under the condition that у, ommon relationship or influence with all other 
les (or some of them) has been S us differently, if there are three variables Хі, X; and 


Жеп the correlation between X, and X; а! moving the linear effect of X; from X, and from X;, is 
partial correlation. The sample co- t of partial correlation measuring the strength of the 
hip (correlation) between X; iV, when the influence of X; has been removed, is demoted 


lically by rı23. By Reime лаке, ме mean subtracting the fitted regression X , from the 
ved values X, obtaining 1 — a part of X, not explained by Аз. 


To derive the co-efficient of partial correlation 723, we use the variables хі, x; and x; which are 
ons from their means. The linear regression of x, on х; and of x; on x; are x,75,5x and x;-5;sxs. 
ving the linear effect of x, from x, and from x; and denoting the residuals by xy, and x23, we get 


X13 = X1 — Рузху, and x23 = x5 – Бууз. 


These residuals may be written as 


up 55 5, а T: СЯ 
Жа ОС Ор an з rere 
3 


‚5 


Now the co-efficient of partial correlation is the ады moment correlation co-efficient between 


хуз and x23. Thus by definition 
2Хі3Хау 
а = ағай 
У хїу Ў ху 
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3 


S, 5. 
Now Ххх, = E -ъ32 s Ed 
3 


= 3) КЕ суса. S entry 315222 
2 23 173 13 
5, 5; 52 


= Ухх = 5 ухх, т” Зу + зз 515 уг 
5, 5, 52 
= п [712515 — 7257135152 — 7137235192 + 7137235152] 


\ 


=n $\$› (т\з — ri3rz3) 


Se 5 
=<" +n crs ee te ғ 
= 8448-2151 
= nS} (1-73). 
Similarly, Dx}, = n$2(1— rà). vo 


Substituting these values in the formula, NM 


Alternatively. The partial correlation co-efficient between x, and x; when tlie influence 
been eliminated, is also defined as the geometric mean of the regression co-efficient bus and 
two partial regression lines of x, on x; and of x; on x, respectively, i.e. 


723 = V2.3 хз ` 
= 15| nz i63 | S2|ñ2 217 
5; Із” EIN 


— А 
=" (723 has the same sign as 4; and 555) 
1-73 41-7 
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a similar way, we can prove that 


5 hig т e 2: Tert 
з= m E 
1-м ү1-т 1-л ylos 


Example 11.5 From the following data, determine ће linear regression equations of X, оп X; and 
on Xs. 


, and г, 


Find the deviations of observed values of X, from the regression, viz. X, з. Repeat the same for X5, 
obtain Хз. Determine the simple аа co-efficient between the two sets of deviations X, ; and 
(P.U., В.А./В.ӛс. 1977) 


The estimated simple regression equation of X, on X; is Ке 
ary 9 
Хі-һ,>5,Х,, © 


QnXXQX,-(EXQU X) Е 
13:57 nK AEN, y and D, , X, A 


The estimated simple regression equation of. cud is 
Xi 8b +byX;, Eu 
n} XX, -SX EX) S, 


as = УХ GN тыз 


utations needed to find 5 are given in the table below: 


%,-Х%-% Xs. 
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And the regression co-efficient are obtained as 


y = 2143, 
(5)(110) - (20) 150 
b,, =14-(1.73)(4) = 7.08, 
.(GX09)-(40Q0) 155 5. and 


? (5)(110)-(207 150 
b,, = 8- (1.03)(4) = 3.88. 
Hence the desired regression equations are 


Å, 27.08 173X, and X, = 3.88 + 1.03X,. 


Next, we compute the residuals Хз = X; — 7.08 — 1.73 X; and X23 = X; — 3.88 — 1.03 Хз, and the 
correlation between them. The necessary computations are given in the following table: 


EJICIEIF ЗЕШЕГЛЕГЛЕГІ 


Hence the co-efficient of Ri lation between X,; and Хз, which is the co-efficient of 


correlation between X, and X; the influence of X; has been rémoved, is obtained as 


28 (^EX,-22XX,-20) 
УХХ; 


3.2670 3.2670 


(7.8670)(1.9670) | 39340 - 


Example 11.6 Given r= 0.492, r= 0.927 and ғ›;= 0.758, find all the partial 
co-efficients. 


теше 208 Shs 0.492 — (0.927)(0.758) 
12.3 7 Е S ra 
; 41-7 NE 1 - (0:927 1 - (0.758)? 
-0.2107 


= —0.86; 


~ 40.1407 x 0.4254 


https://stat9943.blogspot.com 


LE REGRESSION AND CORRELATION 


м саат... 0758-(04920927) _ 
231 
Йе е LA куре r2  41-(0492) олоо)? yt - (o9 927)? 
С 0.302 S e 
40.7579x0.1407 © 
: "aaa |. 0927-(0758 (0492) | 
312 7 
41- 75 2 = ha Ji- 58? y= (0.492)? (0.758) ЛЫ – (0.492)? 
2 0.5541 * 
4/0.4254x 0.7579 


ple 11.7 Show that if ху=ахү+ bx; the three partial correlations are numerically equal to 
having the sign of a, 73; 1, the sign of b and 7:3, the opposite sign of a/b. 


the multiple regression equation x;=ax,;+ bx, we treat x, as de ent and x, and x; as 
t variables. Let the three variables be measured from their respeetiqo еапѕ. 
° 


ing and summing over all values, we get 9 


Ex. за ХЫ. “(Ше кезк е as x, and x; are independent) 
= п(а252 +252) - 
iplying the given equation by x, and summig We have 
Ххх, =aDx? КЎ, = 0, as x, and x; аге independent) 
~ = naS? "i 
Ух тыы Ss Я 
as, 


i Se: where w° = a7S; +252, 


ЕТЕНЕ 
ilarly, ғ, шн апі мт: = 0 
w 


пат 


fis 
Ja- niy- ra) тка 0 x) 


‘= = +1, accor as а is “уе or -v 
= ding е. 
+уа 
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In other words, r5 has the sign of a. 


72 = 73/23 


Again л; = ТО” a 
ү‹1—3)(-г›;) 


_таеб$5„ чар 
Jess Jaw 


2р2 is always positive, therefore, Va^b^ is always positive. 


Now ab may be positive or negative. 


Thus r;;; has the sign opposite to ab or = 


ғ + 
923 Пі? % 
Ф 


Similarly, г, = 
а-а) < 


к according as b is “уе or -ve. 


Hence the result. 


114.1 es between Multiple and Partial Correlation Co-effi 
correlation co-efficients can be connected with the various partial correlation co-effici 
we have shown earlier that 


З Sia 
Т-А = $ x 
where Sy, “УА У (second property of residuals} 
25 : 12303 р 


= Уху (ху bizar у, 23) 


= х2, -b духу because of the properties of residuals. 


=п${ [1 7555555] 5 Spa (1-732) 


2 2 2 
Siz = 51›(1—у›) 
жі srl -л;)@ О, 
-Rê = (1—5 )(1— зә). 1 
ding in the same way, we can find 


1- Rs = (0-020 7 әз). ] 


VILINEAR REGRESSION 


times a scatter diagram indicates that the relationship between the two variables will be more 
described by a non-linear regression line. When this occurs, either we may transform one or 
the variables so that the transformed data appear approximately linear or we may use a 
ial equation. In the former case, the estimating equation may be an exponential or a logarithmic 
In the latter case. the estimating equation may be 

Y =a+bX +сХ?, > 

5 and c are the least-squares estimates of the Population ee їп 

E(Y) sa +BX +yX?. N 


S 


are determined from the following set of normal cedens 
EY =ла+ьЎХ+сУХ? % 

IXY -aXXsbEX!«cXX. у" 

DAF caLX «bmi egy 


quadratic equation may also be X into a multiple linear form 


X= = 425 a б.д, < 


X= Y: аз = 4, KES =c, Х, = Х and Х, = X^. A number of other curvilinear 


ге available. The co-efficient of determination and standard error of estimate can be obtained 
me way as in the case of linear regressions. 


EXERCISES 
ГЕ 


т ‘True’ or ‘False’. If the statement is not true then replace the underlined words with words 
the statement true: 


4 partial correlation coefficient measures the degree of relationship between a variable and its 
3mate from the regression line. 


448 
ii) 


iii) 


iv) 


Y) 


vi) 


vii) 


viii) 


ix) 


x) 


© 
b) |. MULTIPLE CHOICE QUESTIONS SS? 


1) 


ij) 


iii) 


|a) -1to+1 S» 
@ Otol NO 


Q -1 to +1 
<) Oto +00 


6 perfeót.relationship 


A multiple correlation coefficient measures the degree of linear relationship between 
variables in a multivariable problem when the influence with all other variables 
removed. 


The multiple correlation coefficient is the square of the coefficient of multiple dete 


The multiple correlation coefficient В? will be negative in sign when all of the two 
correlation coefficients are negative in sign. 


The regression sum of squares in case of multiple regression is the explained variation. 
For a multiple regression analysis, if X(Y – Y) = 50 and X(Y – Y)? = 20, then the 
coefficient of determination R? is equal to 0.70. 
The standard error of estimate in multiple regression has n — k degrees of freedom. 


The standard error of estimate is a measure of scatter of the observations about the 
line. 


The regression coefficients are the other name for multi sio 
In a multiple regression the addition of new көсір Mi always reduce the 
estimate. N 4 


S 


The range of multiple correlation R ient is 
к ҳу 
b) 00:2 x? 


d) none of above.S° 


ҳу 
Тһе гапре ойна! correlation coefficient is 
a) 001 


d) -1to0 


If the multiple correlation coefficient Вз 1: = 1. then it implies a 
b) high relationship 

c) weak linear relationship 

d) perfect linear relationship 


In the regression analysis, the explained variation of the dependent variable Y is given by 
a) (=?) 


ы X-fy 
(9 xd-ry 
d XY-Y) 


Which of the following is not a standard deviation? 
a) Standard error of the slope coefficient i 
Юю? Mean square errors 
е Standard error of estimator 
d) Standard deviation of the Y variable 


The coefficient of determination in multiple regression is given by 
a) Ri -1-(SST/SSE) 


` b) Rja-1-(SSR/SST) s 
c) Rd, =1-(SSE/SSR) S 
d) Ri =1-(SSE/SST) ev 


г 
i The slope 5, in the multiple regression чынае, a b X, +b,X, measures 
a) the amount of variation in Y explains х 1 
b) the change in Y per unit change ОИ 
($) the change іп ? per unit chemin X,, holding X; constant 
d) the change in Y per uniNQznge in X5, holding X, constant 


e 

The predicted value от X; = 1, X; = 5, and X; = 10 by using the regression line 
NY ¥ =30-10X, «18x, -7.5X, is 

a) 45 

b) 15 

8235 


4) 50 


Which of the following statements remains always true? 
а) The coefficient of multiple determination will increase when new variables are added 
b) The coefficient of multiple determination will decrease when new variables are added 


c) The adjusted coefficient of multiple determination will not decrease when new variables 
are added 


d) Botha and с above 


450 
х) 


Which of the following relationship holds? 


а) "32 =з X555 


Ø 32 = {бүз Х52 


С) "за = {зл Хз 


d) Allofabove 


SUBJECTIVE 


11.1 a) 


b) 
c) 


What is a multiple regression? en the basic differences between simple regression 
multiple regression. 


What is meant by the co-efficient of multiple determination and multiple correlation? 
Explain the assumptions underlying a multiple linear regression model. 


11.2 Carryout the necessary computations to obtain the least-squares estimates of the parameters in $ 


multiple regression model Y — o. p, X, +B, X, +, given 
11.3 Given the data | TN 


n 


a) 


^b) 


11.4 Ses. 


a) 


3b) 
wd 


% — — (&ZU. MA. Econ 


Calculate the езше енп едиапоп, (ге Y =a+b,X,+b,X,) for the above di 


State the meaning 60) partial regression co-efficients b, and bz. 
A (B.Z.U., M.A. Econ. И 


Find the least-squares regression line where X, is the dependent variable and X, and © 
independent variables 


Calculate the standard error of estimate, 512; 


. Calculate the co-efficient of multiple determination and multiple correlation and interp 
result. 
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The following table shows the corres ponding values of three variables X;, X; and Xz. 


2) Find the regression equation of X; on X, and X>. 
5} Estimate X; when X; = 10and X776. 
=) Compute R3,12 and s; 12. B (L.U., M.Sc. 1991) 


--- 


The following data were collected to determine a suitable regression equation relating the length of 
infant, Y(cm), to age, X, (days), and weight at birth, X; (kg): 4 


57.5 528 612 670 53.5 62.7 56.2 68.5 692 


78 69 77 88 67 80 74 94 102 
Ew 


X 2.75 2.15 4.41 5.52 321 432 231 430.371 


Fita least-squares regression equation of the form 9 
© 


У =a+b,X, +b,X, ә 
Predict the average length of infants who are 75 NM, weighed 3.15 kg at birth. 


Calculate the standard error of estimate 5,15. 59) 
. 


Define the multiple correlation co-efficjghDyind prove that 
М 
"т Zna. д? 
t-ra \ 
ES 


Calculate the multiple сово co-efficient А2; of X, on X; and X; from the following 
data: 


d en 
Rios = 


(P.U., B.A. (Hons.) Part-I, 1969) 


The following data represent concomitant values of three variables. 


box, 432-918 752 'd6 42 "48 


1 
р ый rene жо” eA УО та 
POE EU CENT T E 


© 


Calculate all the multiple correlation coefficients, working out the usual simple correlation 
co-efficients. (B.Z.U., М.А. Econ. 1991) 


b) Given Хі =20, S, =1.0, n; --0.20. 
Х:-36, 5,-2.0, n -0.40, 
Хаі-12, 5,-1.5, № 20.50. 


Find the regression equation of X; оп X and X2. . (P.U., В.А.В:5с. E 
11.9 a) Distinguish between the simple and the multiple correlation co-efficients. 


b) If by is the regression co-efficient of X; on Xj, then calculate the multiple co 
co-efficient of X; with X, and X;, where 


bi с] 0.75, by aed 0.58, Б, - 0.88, 
Б = 0.53, b3, = 1.68, and b3; = 1.30 (P.U., В.А /В.5с. 


c) Three variables have in pairs simple correlation coefficients: r = 0.60; m=i 
753 = 0.65. Find the multiple correlation coefficient А, 1; of X; оп X, and X3. 


(P.U., B.A./B.Sc. 
11.10 ау Three variable have in pairs simple correlation ee eee by 
т = 0.8, гуу =-0.7, юз = -0.9. o 
Find the multiple correlation co-efficient A; ОҒ gor and Xz, 


(P.U., B.A./B.Sc 


b) Calculate the multiple correlation соет, 13 and the partial correlation co-e 
from the values given below: AN 


bi; 7 —0.1, bz, = —04, Бат „24° 


‚Ъз = 0.6, Бъз = 0.67, 550.38 (P.U., B.AJB.Sc- 
11.11 a) Explain what is meant by ge correlation. Establish a formula for the co-efficient 
correlation. ND 
. 


b) From the following dat, determine the linear regression equations of X, on Ж; and № 


Find the deviations of observed values of Ху from the regression equation, viz. X, 
the same for Xz, i.e. obtain Хз. Determine the simple correlation co-efficient 
two sets of deviations Жү; and X; ;. 


11.12 The following means, standard deviations and correlations are found for 
X; = Seed-hay crops in cwts. per acre, 
X; = Spring rainfall in inches, 
X3 = Accumulated temperature above 42°F in spring in a certain district in England 
years. 


MULTIPLE ввовввӛлінрз//94Ә499%9.ріоавроісот 483 


X, -28.02, 5,2442, n; =0.80, 
X,-2491 8, -110, 5,--—0.40, 
Х,-594, S,=85, rj =-0.56. 


Find the partial correlation and the regression equation for hay-crop on spring rainfall and 
accumulated temperature. (P.U., B.A./B.Sc. 1974) 


11.13 The following values represent sample values of 450 college students in which the three variables 
represent marks obtained (X;), general intelligence scores (Х;) and hours of study (Х;). Find the 
regression equation for estimating marks obtained. Find all three partial CONS а and interpret 
them in the light ofthe corresponding simple correlations. 


Хі-185, S -112, қ =060, 
Ха -100.6, 8, =15.8, r, =0.32, 


Хз-24, 5,-60, ғ, «035 (P.U., М.А. Stat., 1960) 


11.14 a) Prove that a variable and a residual are ATTI Сы subscript of the variable is 
included among the secondary subscripts of the resid 


b) Given the equations of the three regression planes Sd 
x; = 0.41 x; + 0.23 х;, à $5 
кы 
x? = 0.96 x, 0.025 x3, 
y= 1.04 х- 0.05 х, 


Calculate the partial correlatio; bte Do we have sufficient data to determine the 
correlation co-efficients "д 7,2? (P.U., B.A. (Hons.) Part-I, 1970) 


15a ПХта%Ь;; + bis; s Ху =d + Бул Х + b312 Ху are the regression equations of 
X, on X; and X5, and о, 9% on X, and X, respectively, prove that D =Й үз» 
b)  Isit possible to әде е following from a set of data? 
(i)  7r1270.6,73 = 0.8, 73; = —0.5. 
(ü) 7370.7, г, 7 -04, ri; = 0.6. 
(i) ғ = 0.01, rj = 0.66, ғ = -0.70. 


6 If Х\, X; and X; are three correlated variables, where S,-1, 5;=1.3, 5;=1.9 and r,2=0.370, 
тіз -0.641, and ғҙҙ- —0.736, find ғіз2. If X47 Хү+ Xa, obtain raz, ғаз and ғаз г. Verify that the two 
partial correlation co-efficients are equal and explain this result. 

(M.Sc. Stat.; P.U., 1972, LU., 1990, 92. 94) 


а)  Differentiate between multiple correlation and partial correlation. 
b) IfRiz = 1, prove that (i) Ёз үз = 1 and (ii) Аз = 
c) ША; >з = 0, does it necessarily follow that А |; = 0? 
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-d) Ем == тз = # 1, then show аА з; = А13 = Ёз; = Discuss the case when 


гү? 
vl+r á 
rl. (B.Sc. Eng. 1976} 
11.18 a) Show that if ғ is zero, r123 will not be zero unless one at least of г; and r23 is zero. 


b) Ifthe relation aX, + bX, + cX; = 0 holds true for all sets of values of X, Х and X;, find out 
a: the three partial correlation co-efficients. 


T1.19 Show that the correlation co-efficient between the residuals X123 and хуу is equal and opposite te 
that between x; з and x3. (P.U., М.А. 1963) 


Solution. The co-efficient of correlation between хі әҙ and x; is given by 


Cov (Xia, X213) AE p 233245 
War Gis) ТЕТЕ УСЕ 


ж >хз(5,-бззХ, -553x,) 


п 5123521 
210-63 Exis-0 8,35, S 
n 512351 EE e 


Substituting the values of 5); and 5; |; and simplifying, wi x 
5; 4 1-rj $25 49% 
Сот. = -h, 1 eo, 
S,41- n5 E Ds ч 
Again the co-efficient of correlation seg 3 and x23 is à 
Cov (хуз, Ху) Es ae S3 
vVar (х,) Var(x;,) п Re M $. Ss 
Hence the result. eS 
11.20 Using the method of least-Squares, fit a quadratic model Y = a +В. +В, +e to the fol 


data: 
xX | -2 -1 0 i ee) 
їз |704 AB 2221522310 


Also calculate the standard error of estimate. 


OOO OOOO Oe 
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21 INTRODUCTION 


Let us suppose that we wish to approximate (describe) a certain type of function that best expresses 
association that exists between variables, A scatter plot of the set of values of the variables makes it 
sible to visualize a smooth curve that effectively approximates the given data set. A more useful way 
present this sort of approximating curve is by means of an equation or a formula. A term applied to 
process of determining the equation and/or estimating the parameters appearing in the equation of 
oximating curve, is commonly called curve fitting. 


It is relevant to point out that the relationship between the variables may be functional or 
essional. In functional relationship, a variable Y has a true value corresponding to each possible value 
another variable X, i.e. there is no question of random variation іп the values of Y, and we make no 
sabilistic assumptions in this respect. In this chapter, we shall limit our discussion to some functional 
xonships, i.e, problems of approximation and not of regression (already discussed earlier). Such 
onships which are common in the natural sciences may be /inear or non-linear. 


2 APPROXIMATING CURVES AND THE PRINCIPLE Ser SQUARES 


The data sets encountered in practice greatly vary in nature, Q therefore necessary to decide 
zh type of approximating curve and equation should be us or this purpose, some of many 
mon types of approximating curves and their equations are sin 


sight line or linear curve, Y=a 

bola of second degree or quadratic curve, Yg bX + сх 
bola of third degree or cubic curve, та + bX + сд? + dX 
zonential curve, Ts ab* or Y=ae* 


metric or power curve, 
тоја, 


2 on. 


these equations, Y is the абу, variable and X, the independent variable. In some situations, 
, the variables X and УЙ be reversed. 


We may approximate a given set of data by drawing a free hand curve, covering most of the points 
But it is clear that different individuals would draw different curves according to their ины) 
ent. Therefore this procedure of fitting a curve is not satisfactory. 


The principle of least squares is applicable to curve fitting where the purpose is simply one of 
ing (or approximation) of a set of observations. Accordingly, we choose to determine the values of 
ters in the equations of approximating curves so as to make the sum of squares of residuals a 
A residual has been defined as the difference between the observed value and the 
ponding value of the approximating curve. 


12.2.1 Fitting a Straight Line. A straight line is the simplest type of approximating curve and its 
on is written as 

; Y=a+bX 

the values of a and b are to be determined. 


Given n pairs of observations [(X; У), i= 1, 2, ..., n] to which we wish to fita straight line. We 
зе the values of a and 5 by the principle of least squares, which calls for the minimization of 5, 
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the sum of squares of the differences between the actual Y; values and the corresponding values 
by a + ЬХ,. That is we minimize 


$-Y0,-a-bx; 


із! 


ан аты ша 5-0. 
That is : -2ХУ-а-ЭХ)(-П-0, and ` 
E" 
& = 2501 - a- bise 0; 
which оп simplification become 
| LV =nat+oyx 
ЖХҮ =а®Х +bDN?. S 


Solving these two normal equations simultaneously, җе 


.nLAY-LXYEY LAY 
UE 0" um 


The value of a indicates that the least squares ge through the means of observation ( 


It should be noted that, when the A e believed to lie on the curve, the straight Най 
is simply Y-5X and the sum of sq tions to be minimized is 


" 5 -X(Y -bXy. 
S 
For a minimum value of 5, е be zero, that is 

4 


So z2X(Y ANC» —0, which gives X XY 2bYX^ 
as the normal equation and whence 5 — Ix > 


The sum of squares of residuals for a straight line is 
5 = Ў(Ү-а-ЬХ)? 
z2XYQ -a-5X)]2XY!-aXY-bYXY. 


Example 12.1 Fit a straight line by the method of — squares to rei following -— 


DG EET 


Also find the sum of squares of residuals. (P.U., B 
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Let the equation of the straight line to be fitted to the data, be Y=a+bX, where a and b are to be 
uated. 


The normal equations for determining a and b are 
LY =na+b}X, 


XXYsaXX-4bLX? 
We now calculate УХ, EX’, УУ and DAY. as below: 


Кишик а: икн 9 
Thus (һе normal equations become 3 
5a+15b=32 na 
15а + 55b = 115 $ 
Solving these two equations simultaneously, we оьаф? 
а= 07а фа = 1.9 
Hence the equatiori of the required Mi ois is - 
S 
қ” .7 + 1.9X 


The sum of squares of residuals is aen by 


5-Х%- аў 

КЕ а EY!-aXY-bXXY 
S = 242 - 0.7 (32) - 1.9 (115) 

= 242 - 240.9 = 1.1. 


12.2.2 Fitting a Second Degree Parabola. The simplest type of a non-linear approximating curve 
second degree parabola that has the equation 


Ү=а+ЪХ+ с 
фе values of a, b and с are to be determined. 


Let us suppose that we wish to fit this parabolic curve to n pairs of амер (05, Y), i = 1, 2, 

Then we need to find those values of a, b and c which will minimize the sum of squares of 
s between actual Y values and corresponding values obtained by a+bX+cX°. (the principle of 
res). That is we minimize 


S - X(Y, -a-bX, -cX?)? 


https://stat9943.blogspot ср, танынса 


Minimizing 5, we need to set its partial derivatives w.r.t a,b and c equal to zero. Thus 


== -2X(Y, -a-bX, -cX;)(-1) =0, 


в а|& 


2,-25%- a—bX, -cX})(-X,) =0, and 


r 


—=22(Y, -а-ЬХ, -eX?)(-X;) - 0. 


218 


Simplifying, we get the following three normal equations 
EY =na+bLX+c5X? 
EXY-aXX-4bXX?«cE X? 
EX'YsaXX? «БУХ? cXXx* 
These equations are solved simultaneously to determine the values of a, b and c. 


The sum of squares of residuals in case of second degree рага is given by 
S = X(Y -a-bX -cX?)? = X(Y(Y -a - bx 4895] : 


-XY'-aXY-bEXY-ceX49Y oO 


Example 12.2 Fit a second degree probo e following data, taking X as i 
variable. 


(P.U., В.А /В.5с. 1 
Let the equation of the seco! T parabola be 


4. Ү=а+ЬХ+сХ' 
The normal equations 


УУ-У ХУ? 
LEXY aXX bXX а сул 
EXX'Y-aX X! УА ecXX* 

The computations involved are shown in the following table: 


EccL MU aU Re шеа 
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Putting these values in the normal equations, we get 
12.9 = 5а + 10b + 30c 
37.1 = 10а + 30b + 100с | 
130.3 = 30а + 1005 + 354c 


ing them as simultaneous equations in а, b and с, we obtain 
а= 1.42, b = —1.07, and c = 0.55. 
the equation of the required second degree parabola is 
Y-142- 1.07X + 0.552? 


Also find the sum of squares of residuals. | S (P.U., В.А./В.ӛс. 1993) 
The curve to be fitted is Y = аА? +bX. KS 


The normal equations are 


LX'Y-aEX ebEX and DAY = age 
The arithmetic is arranged in the table below: 


Lj 0 0 0 

5 н XP 5 5 

42 du Ns | 16 | 24 48 

AY 27 | 8 | 60 | 180 

A NY | ва | 256 | 100 m 
25 | 125 | 625 |180 


Ew» s pus pn [e Tis pen 


itution gives 


979a + 225b = 1533 
225ах% 55b = 369 


ving them simultaneously, we get 
a = 0.4006 and 5 = 5.0703. 
the desired equation is Y = 0.40 + 5.07X. 
sum of squares of residuals 1 із given by 


S-X(Y-aX!- Bx)? = -xpyo- ~aX* -ЬХ)] 


лымы. о 
b 
© 
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-XY!-axX?Y Y. XY 
= 2491 — (0.40)(1533) – (5.07)(369) 
= 2491—613.2—1870.83 = 6.97 


_ 12.2.3 Fitting of Higher Degree Parabolic Curves. A parabolic curve of degree p, ар] 
a set of observations [(X; Yj), i71, 2, ..., n] has the equation 


Y, =а+ЬХ, +сХ2 4... ХУ | 
ies a, b, c, ..., kare the unknown quantities and where k #0, and п>р+1.. 


Тһе problem is to determine the (р + 1) unknown quantities a, b, c, ..., k in such a way 
resulting values of Y, should be as close as possible to the observed values. We, therefore, take the 
squares of the residuals, i.e. 


n 
S=) (Y, -а-ЬХ, -cX? -...- kX)? 


j іші 


"which is a function of а, b, c, ..., kas (X, Y;) are certain numbers principle of least-squares 
_ the selection of that parabolic curve that minimizes S, the sum luares of differences between 
values of Y and the corresponding values calculated from Же curve. To minimize S, we 2 


ES E 25. A E and set them equal to zero. SimpliffQttion leads to the following (p+1) 


М 
TEES Е 
У ХҮ =аЎХ +ЬУХ? KABA +. +k DK 
Ex'Y-axx? "d TOEX so ek X?" 


SS 
NS 


a Уне DLX! су ХР? + Ж ХХ? 


These are the normal weapon for fitting the i ee curve of degree р. 
simultaneously, we determine a, b, с, =., К. 


For the particular. case, р = 3, the normal equations for fitting the c 
Y, 2a bX, жеХ2 - dX? become 


TY =na+bUN+cD NX? жау 
>ХҮ-аХ Ха БХХ? «су X) а-а хх 
EX'Y-aXX!4bY X HEX нау, Х5 
У Хуа х БУХ «су Хау 
Similarly, parabolic curves of. higher degree may be fitted. 
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The sum of squares of residuals in case of cubic parabola is given by 
S = X(Y -a-bX -cX? ах?) 
z2XEY!-aXY-bXXY-cXX?Y-dYX.X?Y 
A better fit, It is important to note that the sum of squares of residuals enables us to make some 
of comparison. A simple way of judging whether a straight line, a quadratic parabola or a cubic 


la is likely to give the better fit, is to calculate the sum of squares of residuals in each case. The 
ет the sum of squares, the better is the fit. у 


12.2.4 Change of ‘Origin and Unit. The computational labour may be reduced by a suitable 
of origin and unit. If the given values of X, (i=1, 2, ..., п) are equally spaced with a common 

h and n is an odd number of values, say, n-2k*1, the normal equations are simplified by taking 

id value of X as the origin and the common interval A as unit of measurement. That is, if Xp be the 
alue, then u;7(X, – X;j)/A takes the values -К, —(k-1), ..., -2, –1, 0, 1, 2, ..., (k-1), k. Hence we get 
-0- Хи? =... If instead, л is an even number, say n-2k, we take the origin at the mean of the two 
values of X and hi2 as the new unit, The values of u; then become -(24-1),- (2-3), ..., t, 3, 


-3), (2k-1), so that Xu =0= Xi? = .. (Also see chapter 13). 
Example 12.4 The profits, £Y, of a certain company in the Xth уа its life аге given by 
* 


аз N 
2500 2800 3309 240 4600 


Taking и = Х-3 and v = (Y —3300)/100, find үй У curve of v on и in ће form y=a+butcu 
e the curve of Y on X. 


(P.U., В.А /В.$с. (Hons) 1964) 


Since и-А-3 (given), so we find that "y of odd powers of u are zero, ie. Хи-0-Хи2. 
The normal equations are thus те 

Уу=ла+сУи? 

Уиу=ЬУи?, NS 

Xu v-aXu) +сЎи°. 


ns are computed in the following table. 


PA Ee ЕИ ЕСД Е 


a https://stat9943.blagspgl-Ggto sransnear 


Substituting these values in the normal equations, we get 


5а +10с =9, 
10Ь = 53, 
10g + 34с = 21. 


Solving them, we find а = —0.086, b = 5.3 and с = 0.643. 
The equation of the required parabolic curve is therefore 
| y= 0.086 + 5.3и + 0.64312, 


Y 3300 . 
100 


Іп order to deduce the parabolic curve of Y and X we replace и Бу X-3 and v by 
above relation. Thus we obtain 


12:330: буе ү 5.3(X - 3) + 0.643(X SA 
100 
Simplifying, we get 
' Y=2280 + 144.2X + 64.3%", © 
the required parabolic curve of Y on X. AT 
12.3 EXPONENTIAL CURVES S 


Equations in which one of the variable quanti бсш as an exponent such as ¥=ac™, are 
exponential equations and graphs showing these ES as exponential curves. Exponential 


used to describe a relation in which one varial approximately a geometric progression, 
other forms an arithmetic progression. Da 5 hybrid type frequently occurs in the fields of 
banking and economics. In the equation ‚ the letter c is a fixed constant, usually either 10 oc 


a and b are determined from the M and b are estimated by method of least-squares, we 
minimize S, where 


5-ДУ- ещй” 
ЕО Е QUE алза sng AGL 
5-2 Y, ае }[-e*"] =0, and 
05 
ЕЛ 
Simplifying, we get 
Eye = ауе" 
УХ, уел = ау Хе" 
It is difficult to solve these equations as the solution requires tedious numerical 
solution simplifies if the non-linear curve may be reduced to the linear form by some 


transformation of one or both the variables. The equation Y-ae"' can be linearized by 
logarithms to the base 10, of both sides. Thus the exponential curve becomes. 


log Y = log a + (blog e) X 


- 2X[Y, - ae" ][-ae™ . Х,1-0. 


t 
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which may be written as 
Y' =A+BX 


Where Y' =log Y, A=log а and В= log e. But this is the equation of a straight line in log Y and X. Hence 

Фе method of fitting an exponential curve to the observed set of data is to fit a straight line to the 

logarithms of the Уз. It should be noted that it is the deviations of log Y, and not of Y, which are being 

minimized. It is relevant to point out that log form is better for calculating the values from the fitted . 
e. А 


We give some of ће more common non-linear curves with suitable transformations to convert 
m into linear form Y=a+bX. 


Y'=log Y, A = log a, 
X'=logX 


Y'=log Y, A = log a, 


e ” 
It is worth remaking that, | Не variable Y incorporates ап element of random variation, we 
ce a random error term e equations become ( 


Y-a* xS 

Y-a-bX-*cX +e etc. 
will be very similar to the regression models discussed in an earlier chapter. 1 "us 
Example 12.5 Fit an exponential curve Y = ае" to the following data: 


Lf ae 251 6 
16 45 138 402 1250 3630 


(P.U., B.A/B.Sc. (Hons.), 1962; B.Z.U., 1976) 
We can write the given equation as 
ы log Y=loga+(bloge)X 
or Y'=A+ BX. (From of a st. line) 


where Y’ = logio У, A = log a and B = b logy е. 
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As the equation is linear in Y' = log Y'and X, therefore the two normal equations are 
LY’ =nA+ BOX 
ХХҮ-АХХ»ВХХ?, 


The necessary calculations are shown іп the following table: 


Eas (ein cal Шын 8 


| тош | 21 


Substituting these E the normal equations become "tu 
6A + 218 = 8.258 
214 +918 = nd 
Solving these equations simultaneously, we 
А = -0.2805, and В = 0.4734.) 
а = anti-log A = ant-log (oe 05) 


=anti-log 1.7195 ee es ^ 
and 04343 5-0. pe b - 1.09 (> logi e = 0.4343) 
Hence the equation of théXtrve fitted to the data is 
Y = 0.52 (e) 9X 


Example 12.6 Fit an equation of the form Y-aX* to the following data: 


ЭГЕСИ 
2.98 4.26 5.21 6.10 6.80 7.50 


We may reduce the given equation to a linear form by taking logs to the base 10. Thus 
log Y = Іова + Б log X \ 
or Ү'= А +bx' < 
where Y' = log Y, A = log a and X' = log X. 
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As the equation is linear in Y' = Іор Y and X’ = 
XY'snA«bLX' 
EXY'2AXXWwbLX". 


log X, therefore the two normal equations are 


The following table contains the necessary VOS QR 


rean] т [тешли e 


0.189449 
0.341986 


0.472829 
0.581918 
0.681003 


Substituting these summations, we get 
6A + 2.8574b = 4. 4” 
2.8574А % 177499 267. 
Solving them simultaneously and taking E we get L 
a = 290 b - 0.5144 
Hence the required equation is NO 
Y = 2.978 (x) ^'^ e 


5 МУУ 


HER TYPES OF CURVES 
other types of curves frequently encountered in applied statistics are the following: 


.1 Modified Exponential Curve. A modified exponential curve, which is obtained by adding 
k to an exponential curve, is defined by the relation 


Y-k*ab*. 


describes a set of data, the absolute growth of which decreases by a constant proportion when 
tive and “b” is less chm one. 


first method to fit this curve is to transform it into a linear form by taking logarithms of both 
then to use the least-squares method. But this method is difficult for practical use. In the second 
need three equatioris, because there are three constants k, a and b which are to be determined. 
data are therefore divided into three equal parts, leaving one or two values at the beginning, 
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if necessary, to obtain the three equations, the criterion of fit being that the three partial totals of the tresi 
values must equal those of the original data. 4 
Let n denote the number of values in each third of the data. 
Ра 


Then the first equation is 
E, Y =nk+a+ab+ab? +ab° +...+ ab"? 


-ak*alebeb! +b? eb] 


л _ p= 
nad | єл ааа tnt 


In a similar way. the other two equations are obtained as 


5 «(ы-і о 
x, ¥=nk+ab (==) қ” 


Now we find the constant k, a and b. et 
Subtracting the first'equation from the second we get 


ху JS Yoa e =» 


Again, subtractirig the HA cil third one, we get . 


ÉyY-siPe oe У-у 


Dividing, we һауе E 


У-и » 6" 1) || (6-1): |_,, 
E YNY |е ESAME |+ 


which gives Ь= Z Y-F 
1 Er-nY 


Finally, ^ ^ а-(5,/- ED Lye and 


1 b -1 
km WEIT 
Hzr (s | 


2,У -near[ i) and 
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12.4.2 Тһе Compertz Curve, named after Benjamin Gompertz, is given by the equation 

x 
ү-іа”, 
where k, а and b аге constants. The equation is changed to modified exponential equation by taking 
‘egarithms of both sides. Thus 
log Y - log k + Бора 

"б у= kh ea’ bX 

where Y'=log Y, k'=logkand а = log a. 


The Compertz curve, which increases first at an increasing rate, then increases at: a rao rate 
il it reaches a maximum level, is frequently used in business and actuarial work. 


12.4.3 The Logistic Curve, which is widely used to represent growth, is defined by the relation 


k 
ЕҮТТІН e 
2 5 1 b x KS 
inverting, we get Y - Sw" 


: жас” "I ; ‚ 


к= т and а= кзы This is similar in foni to egi ified exponential curve rt — is expressed as 


function of X and the same method of one therefore be applied with the reciprocals y instead 
Y.. The use of this curve to analyse pop 5 and biological growth was advocated by Raymond Pearl 
L.J. Reed. It should be noted thai № logistic curve has four different stages, viz., (i) a period of 


ively slow growth, (ii) then ы of accelerated growth (iii) then a period of decelerated growth 
(iv) finally a period of 548% Қу, when the curve does not go up at all. The growth of human 
tion and that of есопоћё variables are appropriately described by the curve as they conform to 
stages. 


12.4.4 The Makeham Curve is defined as 
\ (Yet 
the logarithmic form; 
log Y — log k+ Xlog 5 + c” log b 
=A+CX+ Bc* 
= log k, C= log s and B = log 5. 


This type of curve, which is actually a combination of a straight line with a Comper; curve, is 
in actuarial and insurance work. 
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12.5 CRITERIA FOR A SUITABLE CURVE 


Frequently, we are required to choose a suitable form of curve to obtain a reasonable fit to the 
Observed sets of data in two variables. The suitability of several curves may be determined by examining 
the differences in the values of the dependent variable Y. The first difference, denoted by AY read as 
delta Y) is defined by AY, = Y,,, — Y, , the sëĉond difference is defined by A^ Y, = AY,,, — AY,, and so оп. А 
straight line has the property that its first difference is equal to b (a constant), a second degree parabols 
has the property that its second difference is equal to 2c (a constant) and, in general, a parabolic curve of | 
the тїй degree has the property that its nth differences are constant. Thus we fit 

i) a straight line, if the first differences between successive values are approximately constant; 

ii)  asecond degree parabola, if the second differences are approximately constant; 
iii) а third degree parabola, if the third differences prove to be constant; 
iv) an exponential curve, if the first differences of the logarithms are approximately constant; 


v) alog parabola Y =ab*c* F ‚ if the second differences of the logarithms of the Y-values tend 


be constant; 
vi) а modified exponential curve, if each first difference is a офат percentage of the prec 
first difference; 9 
vii) а Gompertz curve, if the first differences of logari: e changing by a constant percentage, 
viii) a logistic curve, if the first differences of eciprocals are changing by a coi 
percentage; and 4 


М 
ix) a reciprocal line T a bX , ifthe ege of the data'show a straight line when plotted. 


a graph. ў ep 
12.6 FINDING PLAUSIBLE VALUES BY THE PRINCIPLE OF LEAST-SQUARES 
The principle of least squa е M also be applied to find the most satisfactory values of 


'unknown quantities from a independent liner equations in the unknowns when the m 
equations is greater than the n of unknowns. 


Suppose there are Ё оул quantities Д, X^, ..., X, and let the л observed relations where # 
be 


a X, b X, +..+ fX, ml 
a, X, b, X, +...+ f, X, =1, 


a, X, b, X, t f, X, =, 
where a 'ѕ, b's, ..., l's are constants. 


When n>k, i.e. the number of equations is greater than the number of unknowns, there 
exist a unique solution. In such cases, we therefore try to find those values of ХІ, X5, ..., Ху whi 
simultaneously satisfy the given set of independent linear equations as nearly as possible. Such 
obtained by the least-squares method and are called the best or most plausible values. 


Кызыгы асай ПЕр5//51819943.Ыоовроі.сот к” 


The least-squares criterion calls for the selection of those values of Xi, X5, ..., X, which make the 
sum of squares of the discrepancies.D;'s, also called errors or residuals;a minimum, where 


D, = aX, + bis + ... + ]Жу— 1, е БВА Ae J 


In other words, we have to select those values of Х|, X2, ..., Ху which minimize 


$ = ўр} -Хах +b X, +. + fiX, 1)? 
ізі % 

It is obvious that 5 = AXi, Xs, ..., Ху), that is, the sum of squares of residuals is some function of 

Xu Xs, ..., Ху. If S is to have a minimum value, it is necessary that its partial derivatives with respect to 

Xi, Xo, ..., Xio if they exist, vanish there; hence X, X5, ..., X, must satisfy the equations 


08 

тты ыза аа ады 

205 ое +ЫХ,+..+,Х,-—1,)=0 

Ox; . : 
SS 
Q 

~ 4 ew 

эх ZNA X, +O Ks +. Та 


The equations given above may be written іп Mandard form as 
X, Da? +X, Xajb, +.. X, ei = хал 
X, Xajb, + X, X,b] +.. d bf, = Ebl; 


eis 


X, Xa, f, SS ВЕЕ; TA =} fil, 


simultaneous equations obtained by minimizing process, are the normal equations which are 
taneously solved to obtain the best or the most plausible values of X;, X», ..., Xy. 


It should be noted that the normal equations for a set of variables are obtained by multiplying each 
ion by the co-efficient of the respective variable in the equations and adding them together. This is a 
'enient way for remembering the normal equations. 


Example 12.7 Apply the principle of least-squares to solve 
2X + Y -0,3X -2Y- 0, -X *Y - 2. (P.U.. В.А./В.ӛс. 1971. 75) 


There are 3 linear equations and 2 unknown variables X and Y, therefore we apply the least-squares 
to get the most plausible values of X and Y. 


Now S=(2X+ У 0) + (3X-2Y-0y + x «Y #2)? 


Spot.com 
ODUCTION TO STATISTICAL 
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The normal equations are <== 0 and 25-, 
2(2X + Y) + 33X - 2Y) - X +Y * 2) 0 
(2X + Y) - 3X -2Y) + (-X 4Y * 2) -0 
14Х-5Ү- 2 and'-5X + 6Y =-2. 

Solving these equations simultaneously, we get. 


Le. 
and 


. 
or 


X = 0.034, and Y = 0.305. 


Example 12.8 Find the most ——— X and Y from the following equations: 
X-Y-3-0 
3X + 2Y-4=0 
2Х-3Ү%1-0 


(P.U., В.А. (Hons.) Part-I, 1963, В.А./В.ӛс. 1 
We first find the normal equation for X. Multiplying each equation by the co-efficient of X in & 
have 


Х-Ү=3 ой 
9X + 6Y =12 
| 4Х-6Ү аА. 
13, which is the no: 
We then find the normal equation for Y. А 


Adding, we get 14Х- Ү- оп for X. 


multiplying each equation by the co-efficient 
it, we get S 
Ou +¥=-3 
: SE 6Х +4Ү =8 k 
N -6Х + 9Y =3- 


Adding them together, үзе -X+14Y =8 as the normal equation for Y. 
Thus the two normal tions are 


14X- Y - 13 


-Х+ЦҮ=8 
Solving them simultaneously, we obtain 


X - 0.97 and Y = 0.64 


which is the required solution. | 
| EXERCISES 
12.1 a) What is meant by Curve Fitting? (P.U., B.AJB 
b) ; IA ; 


Explain the principle of Least Squares with particular reference to a straight line fit 
sense, does it give the "best" solution? 


f (P.U., B.AJB 


CURVE FITTING BY L| 


c) 


a) 


b) 


c) 


a) 


b) 


а) 


b) 


2) 


b) 


` 
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Fit a straight line to the following data: 


Calculate the values of Y for each value of X, obtain the values of residuals e;'s and check 
that Z е; = 


By means of Least Squares, show how a straight line can be fitted to a set of given 
observations, and obtain the normal equations. 


Prove that a least Squares line always passes through the point ( X , Y ). 
(P.U., В.А./В.5с. 1978) 


Fit a straight line to the following data and plot on the graph paper the actual and calculated 
values. 


Write down the equation of a straight line through the фп and derive an expression for 
finding its slope by the principle of least squares. SM (P.U., В.А./В.ӛс. 1991) 


Fit a least-square pe эзе following oe: 


Measure the deviations = the = Ihe and find the sum of squared deviations. 


Find the пөліші ations wig’ Determine (he: valoes of aand in’ lenst фага ine 
Y=a + bX; and show that Nod of squares of residuals from the least squares line is given 
by 

Ss- rdv. ЬУ ХҮ 


Fit a straight line,tQ SS. 


(P.U., B.A./B.Sc. 1962; 80) 


` Fit the least squares line for 20 pairs of observations having X = 2, Y=8, XX? =180 and 


X XY = 404. (P.U., B.A./B.Sc. 1986) 


Fit Y=a+bX by least-squares E estimate Y for X=6. Lon fit X=c+dY and use this equation 
to estimate Y for X=6. Account for the difference in two estimates. 
` (P.U., B.A. (Hons.) Part-II, 1963-5) 


Given 
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12.6 a) Find the normal equations for a, b and c that will minimize 
S$ - XY -(a* bX « X?) 
b) Show that the sum of squares of residuals for a second degree parabola is 
,SeZY!-aXY-bEXY-cXX^Y 


c) Fita parabola of the form Y=a+bX+cX" to the data: 
X 


[p 


12.7 a) By means of the principle of least squares, show how a parabola of second order 
fitted to a set of n observation (X;, Y;) and obtain the normal equations. 


b) For 5 pairs of observations, it is given qt А.М. of X series is 2 and A.M. of Y series is 
is also known that 


УХ? =30, XX =100, EX* =354, E XY = 242, xI- 850 < 
Fit a second degree parabola, taking X as the одра up 


Pas [de is 2 25 39 35 30] 
11:13 16 20 27 34 44 
% 


(P.U., В.А. (Hons.) 
12.10 The profits, ЕУ, of a certain company in the Xth year of its life аге given by: 


Taking u — X — 3, v = (Y – 1650)/50, show that'the parabolic curve of v on и is 
v + 0.086 = 5.30 и + 0.643 и, 
and deduce that the parabolic curve of Yon Xis 
Y= 1140 + 72.14X + 32.1447. (P.U., B.A- 
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11 Fit, by the method of least-squares, 
i) the straight line of best fit, 
ii ^ the 2nd degree parabola of best fit, to the following data: 


| x [20 25 30 35 40 45 so 55 


240 315 403 450 488 520 525 532 


- Also calculate the sum of squares of residuals in the two cases. 


and calculate the sum of squares of residuals in the three cases. 


a) You are given data in two variables X and Y and you have to takes decision about fitting a 
_ . Suitable trend. How will you proceed? 9 i 
қ” (P.U., B.A/B.Sc. 1987) 
o 


pairs of values of X and Y. 
1 


17 


b) Given the following 


Fit a suitable curve. 9 (P.U., B.AJB.Sc. 1976) 


a) Explain the principle of least use it to obtain the normal equations when a cubic 
parabola is fitted to n pairs of NN tions. 


i when X-5 and 6. 
Estima tte Ур: (Р.С. 1972; P.U., B.AJB.Sc. 1978) 


The number (Y) of bacteria per unit уоћте present іп a culture after X hours is given in the 
following table: 
No. of hours (X) 


No. of bacteria per 47 65 92 132 190 275 


i -squares having the form Y=ab* to tlie data. Estimate the value of Y when X-7. 
p Баж (P.U., В.А./8.5с. 1969, 79, 80) 
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12.16 Fit a simple exponential to the following data for a growing plant by taking the logarithms of 
exponential equation. 


0.75 1.20 1.75 2.50 3.45 4.70 6.20 8.25 11.50 
y] 


12.2. 14.5 173 21.0 25.0 29.0 


12.18 The following data represent the enrolments at a d liberal arts college during the past 


years: 
Y (enrolments) 304 341 393 457 548 670 882 


Use the method of least-squares to estimate a curve of the format and predict the 


years from now. 9 (P.U., В.А./В.ӛс. 1% 
12.19a) Given n=8, XX =16, ХХ? -204, Z X? = 582, © ү-23, ХХ log Y-104. Fita 
curve. % (P.U., В.А./В.ӛс. Hons. 


b) 


3 4 9546 


"3 1200 900 600 200 110 50 


12.21 Fit a curve of the uu to the following data on the unit cost in dollars of producing: 
electronic components and бе number of units produced. 


50 100 250 500 1000 


Use the result to estimate the unit cost for a lot of 400 components. 
` (P.U., B.A./B. 


12:22 It is thought that two physical quantities Х and Y should be connected by a relation of 
Y=aX". The experimental values are: 


| x |05 15 25 50 100 


ИШЕ 34 70 128 298 682 


Find the best aus of a anó n. (P.U., B.AJB. 


https://stat9943.blogspot.com 


VE FITTING BY LEAST SQUARES 


4 The discharge of a capacitor thro — gave he following results: 


| t(seconds) |05 08 14 20 25 
| уон) | 91 85 75 67 61 


Fit a curve of the type v ae" to these data. 
уре Y=ae™ to the following data: 
4 


200 545 


4а)  Fitacurve ofthe 


5 
1484 


where e = lim il $ 
n n 


b) Obtain the values of Y from the approximating line for various values of X. Do the deviations | 
of the observed values of Y from the corresponding calculated values add to zero? Explain 


your result. I 
< (PU, B.AJB.Sc. 1977) 
Estimate the constant of Pareto Curve, п = AX ^, which fits the ata low: 


ру! = constant. From the foll data, find the value of y by fitting a straight line to the 
EE independent variable. 


05: 10-15 20 25 30 
1.62 1.00 0.75 0.62 0.52 0.46 


а) Derive the least-squares equations for fitting a curve of the type T a+bX toa set ofn 


The pressure (p) of a gas and its T (v) are known to be related by an equation of the form 


observations. Also find the values of a and b. 


b) Fita reciprocal curve ; =а+ЬХ to the following data: 


Оку а. 12216 
en ee 25 "3 
a) Find the normal equations for determining a, b and с from the linear equation Y — a + bX, + 
cX2. ; 4 " 


——" ж "лк 
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b) 


12.29 a) What is the modified exponential curve? Describe the method of fitting it. 
(P.U., B.A. (Hons.) Part-II, 1 


b) Derive the least-squares equations for fitting a modified exponential, Y = c + ае to a set 
n observations, and indicate why these equations would be difficult to solve. 


12.30 Write a critical note on the law of growth as portrayed by the logistic curve and the Go 
curve. (P.U., B.AJB.Sc. 1 


12.31 Use the "principle of least-squares" to find the normal equations when the number of equati 
greater than the number of unknown quantities. 
* (РЛ). В.А./В.ӛс. 1981, 84; 86, 


12.2а) Explain the method of /east-squares. Apply it to solve the@juations 


ж” X+7Y=17,2X-¥=0,3X-2¥=-1 y Ẹ (P.U., В.А... I 
b) Find the most plausible values of X and Y s following equations. Also compute 
* зит of squares of residuals. 52 
2X + Y=4.8, -X +3, 
3X-2Y=-2.1, ЗХР = 8.0, (P.U., В.А /В.5с. 198 
12.33 a) Find the most plausible values Hand Y from the following equations: 
X+Y=3.01, қУ2Х-Ү = 0.03 


Х%3Ү-7202, бо 3Х%Ү-497 
b) "Obtain the best reat values of X and Y from 


2x -Y Ny 3X- Y = 10.02, 

X+2Y=5.02, 3X + 2Y 2 097. (P.U., В.А /B.Sc. 
12.34 Form normal equations and solve 

X+2Y+Z=1, 2X+Y+Z=4, 

-Х+ү+22=3, 4X + 2Y-5Z=-7. (P.U., В.А./В.ӛс. 1962, 
12.35 Find the most plausible values of X, Y and 2 from the following equations: 

X-Y+2Z=3, 3X + 2Y-5Z=5, 

4X +Y+4Z=21, -X * 3Y * 3Z- 14. 


(P.U., В.А./В. 


***9999999 , 
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131 INTRODUCTION 


А time series consists of numerical data collected, observed or recorded at more or less regular 
als of time each hour, day, month, quarter or year. More specifically, it is any set of data in which 
rvations are arranged in a chronological order. Examples of time series are the hourly temperature 
rded at a locality for a period of years, the weekly prices of wheat in Lahore, the monthly 
tion of electricity in a certain town, the monthly total of passengers carried by rail, the quarterly 
of a certain fertilizer, the annual rainfall at Karachi for a number of years, the enrolment of students 
a college or university over a number of years and so forth. 

The analysis of a time series is a process by which a set of observations in a time series is analysed. 

series analysis is rather a difficult topic but we shall limit our discussion to the basics of time series 
ysis. 
The observations іп a time series, denoted by №, У;, ..., Y, ..., are usually made at equally spaced 
ints of time or they are associated with equal intervals of time (t). Given an observed time series, the 
step in analyzing a time series is to plot the given series on a graph taking time intervals (f) along the 
, as the independent variable, and the observed values (У) on the Y-axis, as the dependent variable. 
ha graph will show various types of fluctuations and other points of interest. 


It is worthwhile to note that the middle of the period is taken t resent the data for that period. 
example, the yearly data corresponds to June 30 or July 1, the, e of a calendar year and monthly 
to the middle of the month, i.e. the 15th day of the month. © 


Example 13.1 The following table shows the c bags (hundreds) of fertilizer sold by a 
in dealer. Plot these data as a time series and соте 


98 
9 122 101 143 
94 141 128 160 
125 / 143 135 ‘187 


STORIGRAM SHOWING SALES 
OF FERTILIZER FOR 4 YEARS 


Bags Sold (hundreds) 
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4 The historigram obtained by plotting the given time series, shows that the sales have risen for 
second quarter, have fallen for the third quarter and then have risen to a higher point for the fourth ‹ 
every year. The graph also reveals that the sales, on the whole; have risen over 4 years: The graph 
suggests that by smoothing out the irregularities, the annual rate at which the sales have increased, 
be ascertained. қ - 3 


13.2 COMPONENTS OF А TIME SERIES 


A typical time series may be regarded as composed of four basic types of movements, 
called components of a time series. The four components are: secular trend (T), seasonal variations 
cyclical fluctuations (C) and irregular or random variations (I). These components are assumed to 
outcomes of distinct causes of variation. All four of these components are not necessarily present i 
time series occurring in practice. Let us discuss each of these components in turn. 


13.2.1 Secular Trend. A secular trend (Т) is a long-term movement that persists for many 
and indicates the general direction of the change of observed values, In other words, it refers to a 
broad movement of a time series in the same direction, showing a rise ог fafl within the data. 
secular trend generally dominates other variations іп the long run vers a fairly long period of 
not less than 10 years) The long-term trend is a peculiar charagliefstic of most of the economic varz 


such as sales (see figure), | 
Y GRAPH WING THE 
200 
@ 160 
о 
Ф 
Е 
2 120 
= long term trend 
© 
8 зо 
ГД 
о 
B 40 


2001 2002 2003 2004 x 
Year 


prices, industrial production, capital formation, etc. Analysing the trend component helps is 
the rate of change to be used for further estimates. It also helps in business planning and in 
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13.2.2 Seasonal Variations. The seasonal variations (S), which are mainly, caused by the change 
in seasons, are short-term movements occurring in a periodic manner. These fluctuations are repeated 
with more or less the same intensity within a specific period of one year or shorter (see figure). 


Y | GRAPH SHOWING SALES BY 
QUARTERS FOR 4 YEARS 
= 200 ` 
STE 2004 
Bol у 
5 160 2003 
с 
5, 2002 
© 120 
л 
и 
8 
а 80 


п III 
Quarter « 


main causes for seasonal variations are the wea аков, the religious festivals and the social 
ms. Examples of seasonal variations are theyprités of wheat which fall after the harvesting season 
rise before the sowing time, the sales o inks which are high in the summer and low in the 
mer, investments in Savings Certifica! ch are high in the months of May and June and low in 
months, and so forth. The conc seasonal variation is customarily broadened to include the 
or less regular fluctuations of, er duration occurring within a day, a week, a month, a quarter 
so forth. Examples of such ons are the daily variations in temperature or the monthly variations 
deposits. ў 


13.2.3 Cyclical Fluctuations. The cyclical fluctuations (C) are ће long-period oscillations about 
-tem trend, which tend to occur in а more or less regular pattern over a period of certain number 
. The so-called business cycles which represent alternating period of prosperity and depression, 
an important example of cyclical movements. A cycle, as it is known, is said to be completed 
beginning with a peak (a peak is a value which is greater than the two-neighbouring values), the 
curve reaches a low point, called a trough (a trough is a value which is lower than the two 
ing values) and then rising again reaches the next peak. The period either from peak to peak or 
tough to trough, is usually referred to as the duration of a cycle. Cycles have a duration of 
from two to ten years or even a longer period. In general, a complete business cycle has the 
four phases: (i) the period of prosperity, (ii) the period of contraction, (iii) the period of 
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recession or depression and (iv) the period of recovery or expansion, which finally develops into a period 


of prosperity. 
| 


Y GRAPH SHOWING PHASES 
OF A CYCLE 


13.2.4 Іггершаг or Random Variations. These v. Sins are irregular and uns: 
nature. They occur in a completely unpredictable manner are caused by some unusual events 
as floods, droughts, strikes, fires, earthquakes, wi usts, political events and the like. 
variations are also known as the accidental, residu atic variations. It is difficult to make a 


such non-recurring. variations, though they сап % identified. 


13.3 TIME SERIES DECOMPO чу. 
А time series analysis is mainly о Ёегпей with the decomposition of the observed series 
its components so as to estimate arate effects. To do this, we must make assumptions 


relationship existing among the components. Accordingly, these components are assumed 
either the multiplicative гей x ip (also called multiplicative model) от the additive relati 


model. 

In multiplicative (decomposition) model, we assume that each observed value Y, at any ti 
determined by the product of the measures of all the four components Т, С, 5 and /. Symbolically_ 

Y, "TxSxCXx1,- TSCI, 
where the observed values and 7-values are stated in original units but the other components, ie. 
I are expressed in percentages. On the other hand, in the additive (decomposition) model, each 
we observe, is thought of as being a sum of all the four components, i.e. 
Y T*StC-*IlI, 

where the components 7, S, C and / are assumed to be mutually independent. Before 
series, it is often desirable to adjust the data for calendar variations, for holidays, for price 
forth. Such adjustments help in removing the effects of certain false differences. 
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13.4 ANALYSING THE SECULAR TREND 


The analysis of a trend component involves its measurement and/or elimination from an observed 
time series data. To measure a trend which can be represented as a straight line or some type of smooth 
curve, the following methods are used. 


i) The method of freehand curve. 
ii) The method of semi-averages. 
iii) The method of moving averages. 
iv) The method of least-squares. 


13.4.1 The Method of Freehand Curve. Plot the given data on a graph paper and join the plotted 
points by segments of straight line. Observe the up and down movements on the graph and draw a smooth 
curve or a straight line freehand passing through the plotted points іп a way such that the general 
tion of change in values is indicated. This line may also be drawn by a transparent ruler or by 
ching a piece of thread through the central region of the plotted points. The line smoothes out short- 
fluctuations. The trend values for the given periods can be read from the graph. 


481 


This method is simple and quick. It will be a close approximation а mathematically based trend 
drawn with care. It has certain disadvantages. It is a rough and subj e method. As the drawing of 
depends on individual judgment and experience, different pe: will draw the graphical trend at 
t positions with different slopes. Moreover, considerable ice is needed to make a good fit. 


13.4. The Method of Semi-Averages. Divide lues in the series into two equal parts, 
ing the middle value in each half or omitting it when the number of values is odd. Find 
average of the values in each part and place the a values against the respective midpoints of the 
parts. Plot these two average values on th h of the original values, draw a straight line 
cting the two points and extend the line % ет the whole series. This is the semi-average trend 
which is to be used to read or to compute values. 


This method is simple and quick. уез an entirely objective result when the trend is a straight 
It has two disadvantages. The ic mean which is used to average the observed values, is 
у affected by abnormally жар large values. The method is only suitable when the trend is linear 

у linear. NS nat 
Example 13.2 .Comput¥nd insert on graphs the semi-average trends for the following series: 


a) Annual profits in thousands of rupees in a certain business 


1973 1974 1975 1976 1977 1978 1979 1980 
85 97 100 90 83 105 112 120 
b) Indoor patients ('000) treated in hospitals in the Punjab 


1967 1968 1969 1970 1971 1972 1973 1974 1975 
276 270 260. 286 302 321 351 348 346 


(Source: Bureau of Statistics, Govt. of the Punjab, 1977) 


2) The data are divided into two equal parts, each consisting of 4 years. Having found the simple 
averages of the two parts, the first average value is placed opposite the middle of 1974 and 
1975, and the second average value is placed opposite the middle of 1978 and 1979. These 
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values are ten plotted on the graph of the observed series. The line joining these ро give 
the semi-average trend. : 


37244 -93 


420-4 = 105 


¥ 
GRAPH SHOWING ANNUAL PROFITS 
AND THE SEMI-AVERAGE TREND 
120 < 
© 
110 =o" 
5 100 ey 
Ете S Original Value 
80 A 9——--?Semi-average trend 
o а тол мо ао x 
KON OR NNNM 0 
a an о оо о о 
чоч "xi чоч H 4 
х 
b) Here the number of 5 is odd, therefore the middle value is omitted in order to 
values in the seri two equal parts. Having computed the simple averages of 


parts, they are Сей opposite the respective midpoints of the two parts. They are 
the graph of the original values. The line connecting these points gives the semi-a' 


1092 +4 = 273 


1366 «4 = 341.5 
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Y GRAPH SHOWING NUMBER OF 
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13.4.3 The Method of Moving Averages. Тһе k-period movi erages are defined as the 
ges.calculated by using the К consecutive values of the observed s, then repeating the operation 
dropping one value at the beginning and including the first vah er the preceding total, and so on, 
ing on one value to calculate each successive average. Thi process is continued till the last А 
utive values have been averaged. 


Symbolically, the values of the k-period moving ay ; denoted by а 5; will be as given below: 


1.4 ]& 1% 
a,-2—YY,a,-—»Y,a,-—$»* and so on. 
k t=1 k 1-2 
In practice, the moving averages may ае obtained by the relations: 
Y, - X, Y,. 
a Bala а =а,+ ‚ andso оп. 

k-period moving average is pl d*igainst the middle of its time period. It is relevant to note that the 

will correspond directly. observed value in the series when k is odd and when k is even, the 
average will be пса which will be located between two periods. It is then necessary to 
these averages so that they should coincide in time with the observed values in the series. To shift, 
ically speaking to centre, each average, a 2-period moving average of the already computed k-period 
average is calculated. Then it is called a k-period centred moving average. 


For the purposes of illustration, let us first choose ёо be odd, say, k=3 years. Then we compute the 
moving averages as 


a, =i +Ү, EN) 
1 
а, 7305 +Y, Y), 


i 1 
а, =a +Y, Y), 


on. The averages so obtained are placed opposite the middle year of each group i.e. opposite the 
, 3rd year, 4th year, and so on. This process is continued till the last 3 consecutive values have 
ged. + 
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Next, we choose k to be even, say, k= 4 years. Then the 4-year moving averages are computed as 
a, 7 20, Y, +h +) 


a, 2 105 Y, HY, 1). 


a, = +}, +Y, +}, 


and so on. A 4-year centred moving average, a’, is then 


а! е +ø, +Y, + 1), +Y, +Y, 2] 


-if +2Y, +2Y, +2Ү, +Ү,] 


with weights 1, 2, 2, 2, 1 respectively. Similarly, the 12-month cen ving average can be 
by adding the observations for the 13 consecutive months with theg} central months being counted 

' and then dividing the weighted sum by the weights, i.e. 24. eneral, a k-period (k is even) 
moving average is equivalent to a (k + 1)-period weighte: ving average where the (k — 
periods are given double weights. = 


These average values are plotted оп the sa aite original values and the line connecting 
points is the moving average trend, which smoothes;vüt periodic fluctuations of the seasonal and 
types present in the series. The line may be ded in the general direction indicated by the 
plotting for the purposes of future estima! is 


The period of the moving ens chosen in a way that the period over which ЇЇ 
occur, is covered. This is usually {ў the period of at least опе cycle. For example, when ће 
data are made up of four 12 months, 4-quarter or 12-months moving averages 
computed. When the trend is o: exponential type, the moving average are to be computed by 
geometric mean instead of tic mean. 


That is, a 4-year centred moving average is clearly equivalent to бууы weighted moving а 


The method of movihg average is easy and simple. The moving averages of appropriate 
estimate the combined effects of trend and cyclical components and give a smooth version of the 
by removing the seasonal and other effects. It has a number of disadvantages. The moving ау 
does not provide values at the end or the beginning of the original series by half the period. The 
moving averages are unduly affected by large Y-values. This disadvantage may be reduced by 
geometric mean. In the absence of an appropriate period, moving averages may make the i 
more cyclical than the observed series. It also has the drawback that the method does not 
mathematical expression for the trend, therefore the estimation by extrapolation is subjective. 


Example 13.3 Compute (i) 3-year moving averages, (ii) 5-year moving averages and (iii) 
moving averages for the following data and show the moving-average trends on the graph. 
Year: 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 
Values: 180 20.5 19.6 242 27.8 25.1 25.9 302 34.0 36.0. 
Year: 1990 1991 1992 1993 1994 1995 1996 1997 
Values: 35.0 35.8 40.9 48.4 55.6 60.4 48.6 68.7 


2 
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The computation of the 3, 5 or 7-year simple moving averages consists of two steps; 
1) computation of a 3, 5, or 7-year moving totals and (ii) division of these moving totals by 3, 5, ‚ог 7% 


obtain moving averages. 


болаш of. I Averages 


aie 


" 


The data and the moving averages T plotted on the graph as below: 
Y 
Key RS 


о-с-< Origina 9stues 


---» 3-Y oving averages 
=- 2 moving averages 
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Example 13.4 Compute the 4-quritet centred moving averages for the time series given in 
Example 13.1, and show them and the data on a graph. 


486 

The 4-quarter centred moving averages appear in the last column of the following table. These are 
the trend values. They are plotted on the graph of original observations and the trend is shown dashed 
(see page 487). ; 


4-quarter centred 
moving averages 


(5)- col(4) +8 


TIME SERIES ANALYSIS 487 


Y ‘GRAPH SHOWING 4-QUARTER 
CENTRED MOVING AVERAGE TREND 
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13.4.4 The Method of Least-Squares. A trend line, which cage described by a raathematical 
tion of the form of a straight line, a parabola or an рб qut be measured using the method с 


ars to be linear. Then the equation of the least squares lin, 
The values of a and b, the two constants, arg Determined by solving the following two normal 
tions simultaneously: S) 
Жүҗўй+ьУ!, 


zaXtebXU. 


The trend values are computed 5) this equation by substituting the values of г and are plotted on 
graph of the original values. 


d would be 


To simplify computations time variable (t) is coded by taking the time deviations from the 

point of the periods and the coded time period is denoted by X. For example, when the number of 
is odd, the middle year is taken as 0 or the origin. and when the number of years is even, X=0 is 
at the middle of the two middle years. In the latter case, for convenience, the fractional values of X 


be multiplied by 2 to express the time units in half year units. For an illustration, the coded year 
X are shown in the table: š 
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t-squares, by letting time to be the independent eum suppose that the long-term trend 
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Sometimes, the numbers 1, 2, 3, ..., may also be assigned to X. The coded form of the linear 
would be Y, = a + bX and the two normal equations would reduce to 


EY =na and EXY -bY X^ 

When the long-term trend of a time series graph appears to be a curve, we fit a parabolic 

the second or third degree or a curve of some other type such as the exponential, modified exp 

Gompertz, logistic, etc. It should be kept in mind that a trend, which has the smaller (Y —Y,)^, à 
better fit. : 

This method is mathematically sound and is therefore useful for prediction purposes. It is 


objective method. The method has two disadvantages. It is sometimes difficult to choose a 
mathematical equation. In case of non-linear trends, the method may involve heavy calculations. 


Example 13.5 Determine the trend line by the "oda -squares method from the following- 
the actual values "s the linear trend on the same graph. 


Let the equation of the linear trend be Y, = a + bX. Since r of years in the data is 
can assign X=0 to the middle year 1999, X = 1, 2, 3, 4 to the К sive years and Х--1,-2,-3, 
preceding years. Тһе normal equations then reduce to 


ХҮ-ла and XXY-bXX! Fa X-0) 


^ 
Hence the required equation of the linear trend is Y, 291.7 X, 


. where the middle year 1999 is taken as Х-0 and units of X are 1 year. The trend values are 


“substituting the values of X corresponding to various years into the equation, and are 
column of the table shown above. The total of the trend values agree with the total of the 
The origina] values and the trend line are graphed on next page: 
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Example 13.6 Use the method of least-squares to determine the lig end line for the data in 
le 13.1. Compute the trend values and make an estimate of the пире of bags of fertilizer sold іп 
and 2nd quarters of 2005. O 


Let the equation of the linear trend line by һ- did e the number of quarters in the 
ed series is even, therefore the middle point of NO, middle quarters, іе. the middle of 
IV of 2002 and quarter I of 2003, is taken as X е then assign X = -1, 3, —5,..., to the 
ing quarters and X = 1, 3, 5, ..., to the followi Quarters. For computing the values of а and b in 
line, the necessary calculations are shown t table below: 


“Computation for Lu ї > дпа with even number of values". 


Year by Coded Trend 
quarter quarter e 
E ү, =119.56+276Х 
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Since У Х -0, the normal equations take the form 
УУ =na and ZXY =b} X? 


Substituting the values, we obtain 
а= ХҮ 1913 956 апа poet 73149 236 
п 16 XX? 1360 


Thus the least squares trend line is Y; -119.56--2.76Х, with origin at the middle of IV 
2002 and I quarter 2003, and X is measured in half quarter units. In the equation, 572.76 
trend line rises by 2.76 each half quarter as time goes forward. The trend values are computed 
trend line and shown in the last column of the table on page 489. 


To make an estimate of the number of bags of fertilizer for I and II quarters of 2005, we 
coded quarter numbers which are X = 17 and 19. Substituting these values of X in the equatie 
trend line, we get 


1 quarter 2005; Y, = 119.56+2.76(17) S 
= 166.5 (hundred bags) and ev 


Шашелег2005; Ў, = 119.56+2.76 (19) S 
-1720 (hundred qi) 


curve (parabola) to the.following data and 


м, 
9831 1933 1935 1937 1939 1941 1943 1945 


Index of ce 
лінде Qo. 96 87 91 102 108 139 307 289 
: (P.U., В.А. 


Let X and Y denote respectively the coded year number and the index of wholesale pri 
even number of years is given and units of X are 2, therefore, we can assign X — 0 to the 
1938, X = 1, 3, 5, 7 to the following years and Х = –1. –3, –5, -7 to the preceding years. Let 
of the trend curve of second degree fitting the data be 


Example 13.7 Fit a second de; 
trend values. 


Yrs asbX +cX?, 
where a; b and c are to be computed from the data by the least-squares method. 
Since E Y =0= XX^, the normal equation obtained by the method of least-squares, 
ЖУ =na+cLX’, 
LEXYzbYX., 
XX?Y-aX X! +X". 
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SERIES ANALYSI 


The arithmetic involved in co 


Substituting these values in the normal equations, we get 
1219 = 8a + 168c, 


2601 = 1686, 
30995 = 168a + 6216c. SS 


Solving these questions, we obtain А 4 ev 
а= 110.16, b = 15.48, and cz 
Hence the equation of the second degree trend or the tic parabola is 


Y, 2110.16 +15 S oou, 

the origin X = 0 is the year 1938, and X is up in 1 year units. 

The computed trend values are shown i M column of the table shown above. The total of the 
ues agree with the total of the ori alues. 

ETRENDING 


r the determination "S by any method, the next step in the analysis of a time series, is 
its effect from the obserVed time series. The process of removing the trend component is called 
. А detrended time series is also known as a stationary time series. The data in a detrended 
ld fluctuate around М5 horizontal line as displayed by the dashed line in the following figure: 


The trend component is removed by computing either the deviations from the trend or the 
trend depending on whether the components follow the additive or the multiplicative model. 
relationship is additive, we subtract the trend values from the corresponding original observations. 
the other hand, the model is multiplicative, we divide each of the original observations bw! 
corresponding trend value. The ratios so obtained, are usually expressed as percentages. | 


13.6 ANALYSING THE SEASONAL VARIATIONS 


Having removed the trend component from a time series, we are left with deviations or 
which are averaged for each season or month to measure the seasonal variation. The devi 
usually called the seasonal differences. with the ratios expressed as percentages, may be 
seasonal relatives. That is 


~ original Y - observations 
a seasonal relative = ———_ x i 
: corresponding trend value 


A measure of variation which is usually computed in index form, is called a seasonal 
compute seasonal indices, the components of the time series are assumed to follow the multi 
(decomposition) model. 

It is relevant to note that simple averages of the monthly or «бау values over a period 
are known as the seasonal averages. When these seasonal d are expressed as percentages. 
average of all the seasonal averages, i.e. grand average, they 4% Called the seasonal indices. In 


Seasonal Index = 


А seasonal index may also be compu’ Өт weekly or daily data. Assuming that the 
components follow the multiplicative (dei osition) model Y = 7SCI, discussed earlier, we gi 
the various methods available for comp а seasonal index. 


i) The Percentage-of-A este Method. 
ii) Тһе Ratio-to-Movin Sy ge method. 
i) Тһе NES - 
iv) Тһе өткен Method. 


Let us now discuss each of these methods in turn. 


13.6.1 The Percentage-of-Annual-Average Method. The first step is to eliminate the 
the trend. To this end, compute the simple averages for each year and divide each of tlie given 
quarterly observations by the corresponding annual-averages, expressing the result as a perc 


The next step is to average the percentages with a view to removing the cyclical and 
variations and computing the seasonal indices. For this purpose, sort out these percentages by 
quarters and find the monthly or quarterly average percentages using either the mean or the 
case of mean, discard the extreme percentages under each month or quarter. If the 12 mon! 
quarterly average percentages do no average to 100, adjust them by multiplying each of 
suitable factor that will make the average of all the percentages equal 100. The resulting 
percentages are the required seasonal indices. The computational procedure is illustrated by 
13.8 on the next page. 
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Example 13.8 Compute the seasonal indices for the data in Example 13.1, using the percentage- 
of-annual-average method. 


We find compute the annual-average as below: 


72 98 779 106 
79 122 101 143 
94 141 128 160 
125 143 135 187 


We then divide each of the given quarterly observations by the corresponding annual-average and 
ress the result as a percentage. The percentages so obtained appear below: 


89.01 
71.00 109.66 90.78 128650 


i 
97.90 ess | | 
Qen | | | 


59 Sor 13 
hari 124, 28 | 400. 01 


b 
As the total of the average percentages i ost equa! to ihe desired tota! of 400, no adjustment is 
. Hence the 4 mean percentages are тем seasonal indices. 


13.6.2 The Ratio-to-Moving- e E? Method. This is the most frequently used method for the 

tation of seasonal index numbei first step is 10 eliminate the trend component. To this end, 

the original observations fo; month or quarter by the corresponding 12-month or 4-quarter 

moving average and ех! e result as a percentage, i.e. compute the seasonal relative for each 

or quarter. It is worth to note that each monthly or- -quarterly value is assumed to consist of 
product of the effects of Т, C, S and / components, and each moving average is a measure of the 

ned effect of trend and cyclical components, i.e. 7xC. Thus dividing the original data by the’ 
nding moving averages and then multiplying by 100, an estimate of the effect of seasonal and 

components combined is obtained; that is 


original даа — 100 = «100 ST (seasonal relative) 
TC 


moving average 


The next step is to remove the effects of the irregular component in order to obtain seasonal 
To achieve this end, arrange the seasonal relatives by months or quarters and find the monthly or 
у averages, using either the mean or the median. If mean is to be used, compute a modified mean 
ing the unusually large or small seasonal relative under each month or quarter so that the 
is not distorted. If these monthly or quarterly averages do not average to 100, then adjust them by 
ing each median or modified mean estimate of seasonal index by the correction factor that will 
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make the average of all the indices equal 100. The resulting averages give the desired indices of seasonal 
variations. 

Example 13.9 Compute the seasonal indices by using the ratio-to-moving-average method for the 
data in Example 13.4. 

The original data and the 4-quarter centred moving averages which measure the combined eff 
of trend and cyclical components, appear in columns (2) and (3) respectively in the table below. We 
divide each of the original quarterly values by the corresponding centred moving average and express 
result as a percentage, i.e. we compute the seasonal relatives, shown in column (4). 

Yearand | Y-values 4-quarter centred Seasonal relatives 
quarter TCSI moving average TC | (7С5/-- ТС)х100-5/ 
@) ез) (4) 


In order to remove the ii lar effects and to compute the seasonal index for each quarter, 
seasonal relatives are arranged e following table. 


S 


Com, 


-- = 88.2 113.4 
79.6 1144 89.3, 121.8 
76.4 109.6 95.1 115.3 

“89.3 2002 -- -- 


1560 2240 2726 3505 
“| Меш | 780 120 9087 11643 


Seasonal 
78.45 112.65 91.40 117.51 | 400.01 


*Discard these extreme relatives before computing total. 
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Since the total of the 4 mean percentages is 397.70, we therefore adjust them by multiplying each 
the 4 quarterly mean percentages by 400/397,70 as the sum of the 4-quarterly measures should be 400. 
the desired seasonal indices are obtained as 78.45 112.65, 91.40 and 117.51. These indices of 
onal variation are shown graphically below. 
= Y 

SEASONAL INDICES 


я І II III IV 


QUARTERS sS 


13.6.3 The Ratio-to-Trend Method. In this method, the 
by fitting a least-squares trend line either to the ob: 
s. The rest of the computational procedure is the s 
. But this method is inferior to the тапо-о-т Фф 
шей by it includes cyclical and irregular variation$Qy 

. 


values are obtained for each time 
time series data or to the annual 
S that of the ratio to-moving average 
average method as the seasonal index 


Example 13.10 Compute the indices of s variation by the ratio-to-trend method by fitting a 
-squares straight line trend to the observe series data in Example 13.1. 

We obtain the trend values by fitting ac trend to the observed time series data (Example 13.6). 
then divide each observation in th al data by the corresponding trend value and multiply it by 
The percentages so obtained are. ged by quarters as shown in the following table in order to 

te the indices of seasonal уара Әп, 


mputation of Seasonal Indices 


*92.1 117.1 88.6 111.9 
78.8 115.3 90.7 %122.4 
76.9 110.3 *96.0 115.2 

86:6 *954 86.9 116.1 


2423 327 2662 3432 
| Mean |8077 11423 8873 11440 | 398.13 


Seasonal | 115 11477 ^ 8915 114.94 | 400.01 
Index ] 


*Discard these extreme relatives before computing totals. 


The sum of the 4 mean percentages is 398.13. The mean percentages are therefore adjusted by ` 
iplying each of them by the correction factor 400/398.13 to get the desired sum of 400. 
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Example 13.11 Obtain the seasonal indices by the ratio-to-trend method and by fitting the 
squares trend line to the annual averages of the following data, showing the amount of the 
(£000,000) spent upon passenger travel in the United Kingdom at levels of fares, and charges 
during the periods given. 


We first fit а straight line by the method of least-squares to the annual averages (У), which 
assumed to correspond to the midpoint ѓе. June 30 or July 1 of the corresponding year. 


The equation of the straight line (linear es Y-a*bX . Since there is an суеп n 
years, the origin is taken at December 31, 200409 шагу 1, 2005. The normal equations then reduce 


SY E and У ХУ -bX X^. 
Substituting the values in the weh equations and solving them, we get a = 89.5 and b= 1.6. 


Thus the required trend we Y = 89.5 +1.6Х, where X is measured in half years. 


This line shows that the values of Y increase by 1.6 after every half year or 2- =0.8 after 


quarter. Assuming that the given quarterly data correspond to the middle of the quarter, we calc 
trend values as below: 


When X = 0 which corresponds to January 1, 2005; Y = 89.5 
But we need the values of Y a half quarter later. Thus 


Y =89.5+ 208 )=89.9 


This is the trend value corresponding to first quarter of 2005. Now, by successive addition 
89.9, the trend values for the 2nd, 3rd and 4th quarters of 2005 and the quarters of 2006 are 
while by successively subtracting 0.8 from 89.9, the trend values for the preceding quarters are 
The quarterly trend values thus found are given on the next page: 
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83.5 84.3 85.1 


86.7 87.5 88.3 
89.9 90.7 91.5 
93.1 93.9 94.7 


Dividing each of the actual values by the corresponding trend value and expressing the result as a 
percentage, we obtain: 


The sum of these mean percentages is 399.7, whj so close to the desired 400 that no 
adjustment is necessary. Hence the desired seasonal ind 82.4, 103.0, 124.2 and 90.1. Median can 
also be used to get the seasonal indices. by 


. 
13.6.4 The Link-Relative Method. Thi toy was at one time the most widely used method as 
Фе data for each month or quarter were utili ore completely. But nowadays it is seldom used as its 
disadvantages outweigh its advantages. 


То eliminate the trend compo бе computational steps аге as follows: 


i) Compute the ANT. expressing each monthly or quarterly value as a percentage of 
the preceding Monthly uarterly value. 


ii) Arrange the ТРОА by months or quarters and find an appropriate average of these 
relatives for each month or quarter. Usually median is used. 


iii) ^ Convert the average (median or mean) relatives into a series of chain relatives by setting the 
value of January or the first quarter as ! 00, and carrying the process to include the first unit of 
the next period 


iv) A discrepancy due to trend increment (positive or negative) exists between the chain relative 
for the first January or quarter and that for the next period, Adjust the chain relatives for the 
trend component by subtracting one-twelfth of the discrepancy from the value of February, 
two-twelfth from the value of March and so on or by subtracting one-fourth of the discrepancy 
from the relative of second quarter, two-fourth from the third quarter relative and three fourth 
from the fourth quarter relative : 


To obtain seasonal indices, reduce the adjusted chain relatives to the same ievel as January or the 
quarter by multiplying each of the adjusted chain relatives by the correction factor that will make the 
ge of all the indices equal 100. These final figures are the desired indices of seasonal variation. 


https://stat9943.blogspot.com 


498 i INTRODUCTION TO STATISTICAL THE! 
Example 13.12 Obtain the seasonal indices for the data in Example 13.11, using the link-relatt 
method. 


Expressing the data for each quarter as a percentage of the data for the preceding quarter, we ge 
the link-relatives as below: 


91.0 126.8 
92.4 124.7 


93.8 127.6 


924 1261 1210 731 


© Next, we calculate the chain relatives for the four quarterly averages, setting the value of the first 
quarterly average equal to 100%. The chain relatives аге: 


ers лыш CERE aW Lm 
Chain Relative 100 126.1 152.6 x S¥11.6 103.1 
м 


А Continuing the process, the chain relative for the firs "ег works out to be 103.1 which as = 
matter of fact, ought to have been 100. This increase of 3 due to the trend component present in 
data. An adjustment for the trend therefore become: sary. Since the difference is positive so we 
subtract one-fourth of this from the second Бы , two-fourth from the third quarter figure and 
three-fourth from the fourth quarter. The sum ia тоа chain relatives is 485.65. The quarterly 


figures are further adjusted by "lier AN ure by ———— so as to get a total of 400. The adj 


zs A 


Pee a a per 
Adjusted Chain R edes | 10 12532 151.05 10928 | 485.65 
824 1032 1244 900 | 400.0 


13.7 DESEASONALIZATION OF DATA 


We remove the seasonal effect from an observed time series data to see how things might һауе 
been, if there had been no seasonal component. The process of removing the seasonal component from 
time series is known as deseasonalization or seasonal adjustment of data and the time series 
obtained is called the deseasonalized or seasonally adjusted time series. To get deseasonalized data, we 
divide (considering multiplicative model) each value in the original data by the corresponding value of 
seasonal index and multiply the result by 100. Thus 


figures are given below: 


Descsscnslizad: dang. s ЕЛЕ Scalpel жіне: үу 
period's seasonal index 


- TCSI 


x100 = TCI x100 
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Thus the deseasonalized data contain the effects of trend, cyclical and irregular components. For 
example, the deseasonalized data for 2002 of Example 13.9 are shown below: 


Quartet Number of bags of | Seasonal | Deseasonalized 
fertilizer (00) Index * Data | 


We find that the increase from the first quarter to the second quarter of 2002, expected оп the basis 
112.65 
78.45 
122-79, i.e. 43 hundred bags. From the second quarter to the third quarter, there is a decrease of 122-101, 
іе. 21 hundred bags which is less than the decrease expected on the basis of seasonal pattern amounting 


o (122- 122x zn ues i.e. 23 hundred bags of fertilizer. If there h n no seasonal effect, sales for 


of seasonal pattern is (» -») ie. 34 hundred bags which is less than actual increase of 


the first quarter i 2002, would have been 101 (hundred) bags of figrfizer. | 
When the time series components follow the addi model, ie. Ү=Т+С+5+/, the data are 

deseasonalized by subtracting the seasonal effects from c sponding original values. 

13.8 ANALYSING THE CYCLICAL m 


The cyclical variations can be measure st moving the trend and seasonal components by 


division and then averaging out irregular va er simplest method of obtaining cyclical movement 
is called the residual method. This met) consists of removing the effects of trend, seasonal and 
irregular components from the obse е series data in any order. Any one of the following three 


procedures can be used to estimate c ev 1 and irregular movements: 


First Procedure: 


i) Deseasonalize trc ie. divide each value of the original data by the corresponding 
seasonal index to remove the seasonal component: TCS/ + S-TCI. 


ii) Divide the results just obtained, i.e. the deseasonalized data by the corresponding trend value 
to eliminate the trend component: TC/ + T-CI. 
Second Procedure: 


i) Remove the trend component by dividing each value of the original data by the corresponding 
trend value: 7С8/-- T-CSI. 


ii) ^ Eliminate seasonal variation by dividing the results, і.е. detrended data by the corresponding 
seasonal index: CSI +S=C]. 
Third Procedure: 
i) | Multiply each trend value by the corresponding value of the seasonal index to get ТХ S values. 


ii ^ Eliminate trend and seasonal components by dividing each value of original data by the 
. corresponding 7 x 5 value obtained in (i): ТС$/+ TS=CI. 
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All these three procedures give the same results. In order to remove the irregular variations, if any, 
take an appropriate moving average of a few months or quarters duration. The resulting quantities in 
percentage form are called cyclical relatives or percentages. Do not divide by 7. 


Another method for isolating cyclical movements is the Harmonic analysis, which is beyond the 
scope of this text and hence is not discussed. A study of cyclical relatives is useful for economic 
forecasting. 


Example 13.13 Compute the cyclical relatives for the data in Example 13.1. 


The process of computing the cyclical-irregular movements and cyclical relatives is shown in the 
table below. To remove the irregular variations, a three-quarter moving average has been thought 


appropriate. 


Y-values Trend Cyclical- 
TSCI Values 7 Irregular- 
percentages 
Ci) (6) 


13.9 ANALYSING THE IRREGULAR VARIATIONS 


; The irregular movements of a time series are estimated by dividing the combined cyclical-i 
variations by the corresponding values of the cyclical relatives; that is 
CxI 
c 
The irregular movements can be shown graphically. 


l= 
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13.10 FORECASTING 


In time series analysis, forecasting is a process of assessing the magnitude of a time series variable 
which it will assume at some future point of time. Forecasting is based on the assumption that the past 
pattern and behaviour of a variable will continue in the future. The simple and elementary technique of 
short-term forecasting involves the components of trend and seasonal index. 


Forecasts for a period of 1 year or less can be made by projecting either the least-squares equation 
ж obtain forecasts of trend values (7) or the centred moving averages to obtain forecasts of moving 
average values (7*C), then multiplying these projected values by the seasonal index for that period and 
iding by 100 to obtain a 7х5 or, TXCxS forecast. Thus 
j period's 4 с (period's projected trend, value) x (seasonal index) 
22210 
It is relevant to note that it is very difficult to forecast cyclical and irregular movements. 


13.10.1 Forecasting by Exponential Smoothing. The spondit smoothing is a method of 
ting that assigns positive weights to past and current valu Sully. This technique often provides 
short-term forecasts. The exponentially smoothed series 6% "from the original time series Y, is 


ated a follows: S 
4 << 
Ж.К; 
Y 2= - ae 1, 


< Ү, = wY, +(1- Ғы; 

where the weight w, called the exponential smoothing constant, is selected so that w is between 0 
1. The most commonly used value of w is between 0.01 and 0.3. The exponential smoothing has an 
tage that no values are lost at either end of the smoothed series. 


1 SERIAL CORRELATION 


While analyzing a time series data, theré is a possibility of. dependence (or association) between the 
ive observations. In case, the successive observations are dependent, one measure of this effect is 
le correlation between successive observations. Such a córrelation is called a serial correlation. 


Generally, a serial correlation is defined as correlation between observations ordered in time 
. Given n observations Yi, Yz, ..., Y, ...; Y, made over successive time periods, we change the 
tions into (n-1) pairs such as (Y;, У;), (Y2, Y3), ..., (Ү,-1, Yn). If we regard the first observation in 


https://stat9943.blogspot.com 
INTRODUCTION TO STATISTICAL THEORY 


each pair as Y, then the other observation is Yx, where k can be 1, 2, 3, etc. The correlation be 
Y, and Yı, i.e, the correlation between successive overlapping pairs is called the serial correlation 


first order. The co-efficient of first serial correlation, denoted by rı, is generally calculated by 
following slightly modified formula 


502 


я-1 RT = 
УКУ - Y), - Y) 


nz 


where Y-XYY/n. 


This is also called the co-efficient of auto-correlation at lag 1. The terms serial correlation 
autocorrelation are used interchangeably. 


Likewise, we can find the correlation coefficient between abe rations: separated by a lag of k 29 
periods, which is given by чы 
n-k = SS 
, àW-DG.-D с 
n д 
5 TUE NO 
DM - Y) ам” 


ізі 


This is called the coefficient оҒашосо! von at lag k or serial correlation of order k. 


? = 
We сап also draw а scatter, дады by plotting the pairs (У, Y,+4) on a graph paper to see 
the successive observations ке to be correlated. 


Example 13.14 Sia successive observations on a stationary time series are as follows: 
1.6, 0.8, 1.2, 0.5, 0.9, 1.1, 1.1, 0.6, 1.5, 0.8, 0.9, 1.2, 0.5, 1.3, 0.8, 1.2 


Calculate ғ, the first serial correlation co-efficient. 


The co-efficient of serial correlation of order 1 is given by 
л-1 5 — 
У-и Р) 


Isl 
л. Ta 
dX, -Y) 


tel 


n= 


where ЎЎ, т 192 0, 


ізі 
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The calculations needed to computer r, are shown below: 


енын кше 


| | 
Pico | - | - | mS. [ан | 


Substituting the values in the formula, we оу 


--0.55 


өзе 
Hence the serial correlation o tos order between the successive observations is found to be Я 
(55; 


_ EXERCISES 


Answer ‘True’ or ‘False’. If the statement is not true then replace the underiinéd ед with words 
that make the statement true: 


Secular trend measures the фонны variation of a time series. 
A typical time series may be regarded as composed of five components. 
A secular trend is mainly caused by the change in seasons. 
Irregular variations can be predicted in time. 
` A histogram is a graph of time series. 
The additio: of a time series is Y = TSCI. 


үй) А main objective of fitting trend lines is to forecast cyclical turning points. 


viii) Seasonal indexes can be calculated from thly and ly data only. 


ix) If the yearly sales for year 2007 is Rs.5,00,000/- and the sales index for year 2007 is 62. 
the seasonally adjusted sales figure is Rs.4,00,000/-. 


X) Yearly time series contain the following four components: trend, cyclical, seasonal 
irregular. : 


b) MULTIPLE CHOICE QUESTIONS 


i) Decomposing a time series means that past data is distributed into components of: 


(3, Trend, cycles, seasonal and random 
b) Long term, medium term and short term variations 


c) Constants and variations 


. d) Allofabove 
ii) The seasonal variation in the time series is computed ы 
a) Ratio to moving average method ev 
b) oe to weng method <% 
c) Link relative method NY ; 


(8): All of above ^ 


iii) The seasonal variation in the "оће is computed by 
x 

b) Cyclical = 

(y Seasonal 5° 
d) Irregular © 
After detrending the data, the time series (multiplicative model) consists of 
а) Y=TSCI 
b) Y=TSI 

(J Y-csi 
d) None of above 

v) Ifa time series changes at exact constant percentage then 

a) A good fitted trend line cannot be obtained 
b) A linear line fitted to the data gives a perfect fit 

© A linear line fitted to the logarithms data gives a perfect fit 
d) A nonlinear is required to be fitted 
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мі): Dividing the original time series by moving average, the time series (multiplicative model) 
consists of * 


@ Y-SI 
b) Y=CS 
c) Y-TS 
d) Y-TC 


vii) A company's trend figure for sales for December 2007 is К5.2,00.000/-. Actual sales during 
that period were Rs.1.60,000/- and C x I = 0.80. The value for seasonal index is 


a) 100 
b) 125 
c) 60 
d) 64 


viii) А second degree trend line is Y-15- 0.1t* 0.05€ whe Wis sales (in thousands) and t is 
time (in years) and t = 1 for 1995. What is the ад е for year 2007? 


a) 20,000 % 
(p 22150 SS 
c) 18,000 by 

9 


3) 15,000 - S 


ix) If а 4-quarter moving —— to obtain short-term forecasts, it contains the 
following components Ne i 
m) 


@ тс 
b) TS © 
c) CSI 
d) Conly. 
x) Exponential smoothing is : forecasting method which 
а) uses the actual data, not the forecast data 
b) requires to fit a mathematical model to the data 
с) gives equal weight to all пе periods 
(à all of above 


CTIVE 


a) Define a time series. What are its various components? Describe each carefully. 
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b) Associate the following phenomena of business with the components of time series 
belong to: 
(i) The prosperity in a Business. 
(i) Тһе production of sugar recorded for 1956, 1957, ..., 1962. 
(iii) The weekly statement of the sale of pens. 
(iv) The festival sale. 
(v) Тһе fire in a factory. \ (P.U., B.A/B.Se. 1 


13.2 а) Define the following terms: 


(i) Time Series Analysis, (ii) Secular trend, (iii) Seasonal variations, (iv) 
fluctuations (у) Irregular movements. : 


b) With which characteristic movement of a time series would you mainly ^ associate each 
the following? 


i) a fire in a factory delaying production for 3 weeks, 
di) an era of prosperity, SS 
iii) an after Eid sale in a departmental store, 
iv) aneed for increased wheat production due M increase in population. 
v) the monthly number of inches of ne ity over a 5-year period, 
vi) arecession, 
vii) an increase in employment duri imer months. 
viii) the decline in the death Ks а їп ѕсіепсе, 
ix) asteel strike, 
x) acontinually 7а iM for smaller automobiles. 
| N (P.U., В.А. (Part П), 1966, 1 
13.3 Describe the т. ents of a time series. Describe various methods of measuring 


Trend in a time series, the advantages and disadvantages for each. 
(P.U., B.AJB.Sc. 1961. 


13.4 Describe the different components of time series. Discuss the measurement techniques of any 
ofthem. - (P.U., M.A. (Econ), ! 


13.5 What do you understand by Time Series Analysis? Discuss how you would analyse a time ser 
determine the trend and the seasonal variation. 


13.6 What is a time series? What аге its main components and how will you isolate them? 
(P.U., B.A./B.Sc. 1961. 


13.7 What is meant by seasonal variation? Explain how seasonal variations are measured and 
from the time series data? 


13.8 a)  Distinguish between the Additive and Multiplicative models in time series analysis. 


b) When do you compute the deviations from trend and when ratios to trend? Explain 
eliminate the average seasonal variations from the observed values of the time series. 
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13,9 Critically comment on various methods of eliminating seasonal variations from a time series. 
(P.U., M.A. (Stat.), 1968) 
13.10 Plot the following data showing average wages in rupees of some workers during 1990-2001 and 
find the trend by the method of (i) free-hand curve and (ii) semi-averages. 
1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001 


140, 148 180 195, 200, 235, 260, 280, 290, 330, 325, 340 


13.11 Name the methods used to measure trends. Determine a trend line by а simple moving average of 5 
years from the following data: 


1921, 1922, 1923, 1924, 1925, 1926; 1927, 1928, 1929, 1930 


Value | 102 108 130 140 15 180 16 210 220 230 
(P.U., В.А./В.ӛс. 1963) 


13.12 i) Explain briefly the meaning and purpose of moving averages. 


ii) Тһе number of items of certain product imported into the и Kingdom is given below іп 
thousands of units: 


Year Number Year Number W? Number 
1951. 170 1957. 205 qe 135 
| © 


1952 210 1958 S 1964 80 
1953 188 1959 1965 60 
1954 98 КШ 1966 107 
1955 1967 140 


1956 131 Ся 183 1968 124 


a) Calculate 5- ig averages for the юр data. Using these moving averages, 
determine 


b) Estimate the numbér of items imported i in 1969. (P.C.S., 1971) 


13.13 a) | What is a Moving Average? Calculate a seven-day moving average for the following record 
of attendances: 


E Mon Tues Wed Thurs 


Plot the given attendances and the moving averages on the same graph. 
(B.Z.U., B.A./B.Sc. 1988) 


b) Fita linear trend to the data given in (a) by least squares. Find trend values as well. 
(P.U., В.А./В.ӛс. 1990) 
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13.14 Explain the use of moving averages in determining the trend line in a time series. Determine such а 
line in the following series of values by the use of a simple average of seven consecutive terms: 
Year: 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 
Index:- 187 — 161" 149/ = 142: 1257-129 — 133. 127 — 130 7129 7109 
Year: 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 
Index: 130 136 152 171 169 218 258 279 295 314 
Would you say that this was a satisfactory line of a trend? If not, why not? 
13.15 a) Find out and plot the nine-year moving average for the following series: 
8, 7, 5, 2, 4, 9, 10, 9, 8, 6, 4, 7, 11, 13, 11, 9, 8, 5, 10, 13, 15, 12, 10, 8, 6, 11, 12, 16. 
b) Compute 4-month centred moving average from the following: 
* 23, 26. 28, 30, 31, 35, 37, 32, 34, 38. (P.U., В.А. (Hons.), 1 


13.16 The following are the quarterly index numbers of wholesale prices in the U.K. for the 
1951-55: 


86, 80, 83, 84, 85, 80, 80, 78, 77, 80, 81, 80, 82, 81, 83, 82, ем, 85, 86. 


Ву a centred moving average of 4, calculate the trend. 
(ғ ҘҰМ.А. Econ. 1969; B.A./B.Sc. 1 


13.17 Plot the following data as a time series. Compute чете moving average trend and 
it on the graph. 


106 73 231 
281 229 209 488 
484 447 457 966 


6 1997 1998 1999 2000 2001 2002 2003 2004 


79 

а) Show by direct numerical calculation that the 2-year centred moving average is equi 
a 3 year weighted moving average with weights 1, 2, 1 respectively. 

b) Determine a 3 year weighted average if the weights 1, 4, 1 are used. 


13.19 For the following time series, determine the trend by using the method of (i) semi 
(ii) 3-year moving averages, and (iii) least-squares for fitting a straight line: 


1968 1969 1970 1971 1972 1973 1974 1975 1976 


series 


Which of the trend do you prefer, and why? (P.U., B.AJB.Sc. 
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13.20 a) Define the following terms: t 
Secular Trend; Time series Decomposition; Centred Moving Average; Irregular Movements. 


b) For the following time series, determine the trend by using the method of (i) three year 
moving averages, and (ii) least-squares for fitting a straight line. 


| Yea |1970 1971 1972 1973 1974 1975 1976 1977 1978 


^ Which trend do you prefer, and why? (P.U., В.А./В.ӛс. 1978) 
Ey 


The following are the quarterly index numbers of wholesales prices. 


1995 1996 1997 1998 1999 
Quarterly Index | 125 114 99 80 80 


Fit a linear trend to these data and add the trend to the original Ән. 


b) Fita straight line Y=a+bX from the following results, фгдһ еаг 1948-58 (both inclusive). 
>Х-0,2У-4389,2Х:-110,2 ХҮ 2 3 
Find out the trend values of Y as well. 457 (P.U., М.А. (Econ.), 1968) 


322 The following are the annual profits is thousands. @ ‘Tupees in a certain business: 


1997 1998 1835 2000 2001 2002 2003 
88 таков 91 113 120 122 


i) Use the method of least- Wes to fita straight line trend and make an estimate of the profits 
in 2005. КА, 
X 
ii) Fita parabolic trend SY 


iii) Determine which is the better fitting trend. 


The production of vegetables ghee (’000s tons) in Pakistan is given below: 
Year Production Year Production 


1970-71 136 1974-75 272 
1971-72 162 1975-76 277 
1972-73 187 1976-77 322 
1973-74 225 


Fit a second degree parabola to the data and estimate the production for 1978-79. 
(P.U., B.AJB.Sc. 1979) 
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13.24 Fit a parabola of second order to the following data and find out the tend values. 


1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 


Production 
(in 000 mds) 
13.25 Fit a second degree curve У-ао” аџ+ аз to the following data: 


2001 2002 2003 2004 2005 2006 2007 2008 


| Profit | 273.7 293.5 315.0 336.8 364.4 394.8 424.2 458.7 


Compute the trend values. · 


13.26 Fita quadratic parabola to the following series of observations, taking the year as the inde 
M 


1924 1927 1930 1933 1939 


E of 
coal price 
Use your results to estimate the value of the € 1935. 
(P.U., В.А. (Hons.) Parti 
13.27 The population of a country for the 11 to 1971 in ten yearly intervals in millions, is 
: . 538/722; , 12.70, 17.80, 24.02, and 31.34. 
Fit a curve of the type Y=ab; Nn data and forecast the population for the year 1991. 


(P.U., В.А /В. 5-2 
13.28 а) Define the кн еле , 
(i) (ii) Seasonal variation, (iii) Seasonal Index, (iv) Deseasonaliz: 
b) Compute the seasonal indices for the four quarters by the method of ratio tc 


averages. 


(P-U., B.AJB.Sc. 
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13.29 Compute the seasonal indices for the four quarters by the ratio-to-moving average method from the 
following data of wholesale prices: 


Year ЕНЕСІНЕН 
I H IH IV 


122 125 118 117 
119 114 114 109 
10525995 393-— 189 
86 80 83 84 


(Р.О: В.А./В.ӛс. 1973, 80-5) 


13.30 A merchant's sale (7005 tons) of ordinary coal over а period were as shown below: 


a) Construct seasonal indices, using the percentage gf 


b) Construct quarterly seasonal index values 
them to deseasonalize the 1997 values. 


13.31 Quarterly sales of a certain fertilizer over 


1994, 0» 1995 1996 1997 1998 


By means of centred moving-averages, compute the trend and estimate the seasonal indices, and 
hence forecast sales for each quarter of 1999. 
13.32 Compute seasonal indices for the four quarters by the ratio to-trend method for the data in question 


13.29. 
13.33 Compute the indices of күре variation by ratio-to-trend method from the following data: 


Summer Autumn | Winter Spring 


Use the seasonal indices to v the 1984 values. 


https://stat9943.blogspot.com 


INTRODUCTION TO STATISTICAL TH 


13.34 Construct seasonal indices by ratio-to-trend method from the following 


13.35 For the time series data in exercise 13.28(b), 


a) determine the trend line by the least-squares method; 
b) assuming the multiplicative model, compute the following: 
i) seasonal indices for the four quarters; 


ii) deseasonalized values. 


13.36 Calculate the indices of seasonal variation by /ink-relatives me poe from the following data: 


112 
119 


С 


<920 
IA 2 


13.37 a) 
b) 


Use the trend equation and the seasonal index to forecast the sales for each quarter of 


с) Calculate the exponentially smoothed series, using w=0.1 and w=0.3. Which will p 
smoother trend? 


13.38 a) Describe the residual method in time series analysis. 


b) Use the data of exercise 13.29 to calculate the cyclical irregulars and cyclical relatives. 
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13.39 The following data pertain to the rainfall in inches in England and Wales: 


Jan Feb Mar April May June July Aug Sep Oct Моу Dec 


32 34 34 


32 42 45 
ZE 250747 
54 49 15 


a) Determine the trend by the least-squares method. 
b) Оп the basis of the least-squares line and the multiplicative model, compute the following: 
i) seasonal indices for the twelve months; і 
ii) deseasonalized values; 
iii) cyclical and irregular variations. 


13.40 a) Explain what you understand by serial correlation. S 
b) Тһе following noise measurements were recorded at Желп іп time order they were 
observed: 


65, 64, 63. 61, 60, 58, 63, 64, 62, 64, 63, 63, 62. e£. 64, 66, 68, 68, 69. 


i) Plot the scatter diagram for the pairs Ps 
ii) Calculate the first serial a ep n coefficient г, and the coefficient of. 
auto-correlation of lag 2. 9 
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ANSWERS TO EXERCISES 


Chapter 1, Pp. 11-14 


OBJECTIVE 
(i) F (are), (ii) F (parameter), (iii) F (statistic), (іу)! F (inferential), 
(v) F (descriptive), (vi) F (population), (vii) F (continuous), (viii) F (discrete), 
(ix) F (attribute), (х) Т. 
SUBJECTIVE 
19 (Ы (i) Discrete, (ii) Continuous, (iii) Discrete, (iv) Discrete, 
(v) Continuous, (vi) Continuous, (vii) Discrete, (viii) Continuous, 
(ix) Continuous, (x) Discrete. 
1.10 — (i) Qualitative, (ii) Quantitative, (їп) Qualitative, (іу) Quantitative, 
(v) Quantitative, (vi) Quantitative, (vii) Qualitative. 
1.11 (i) ratio-level, (ii) ordinal-level, (iii) integng)-level, 
(iv) ratio-level, (v) nominal-level, (vi) s 
(vii) ordinal-level, (viii: Ratio in- tad of interval. PUN inal-level, (x) ratio-level. 
Е (1) 230207 (ii) 937.1, j 9 0.003599, (іу) 1.004, 
(у) 0.07000, (уі) 22.26. $2 
SP | 
Chapter 2.064445 
СТІУЕ і ә” ң 
(а) (i) Е (time series data), СУ (й) Т, (iii) Е (does), 
(iv) F (mutually exclusive (v) F (can), (vi) F (graphically), 
(vii) F (histogram), А (viii) F (cannot), (ix) F (one dimensional), 
(x) F (height), S (xi) F (mid points), (xii) F (lowest), 
(xii) FD, — XX (xiv) Е (qualitative), (ху) F (same). 
(b) (i) b, (i) d, > (iii) с, (iv) b. 
(v) d, (vi) с, (vii) d, (viii) d, 
(ix) a, (x) b. 
Chapter 3, Pp. 77-86 
CTIVE 
(a) (i) T, (ii) F (median), (iii) T, 
(iv) F (is), (у) Т, * (vi) Е (mode), 
(vii) F (mode), (viii) F (second), (ix) T. 
(x) F (geometric mean), (xi) T, (xii) Т, 
(xiii) F (negatively). (xiv) F (right), (xv) F (negatively). 
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® 6 с, (ii) b, (iii) с, 69) с, 
(v) c, (vi) a, (vii) c, (viii) b. > 
(ix) b, @ d. 
SUBJECTIVE 
3.14 (i) Median or Mode, (ii) Mode, (iii) Mean. 
315 (965 
3.16 (c) 2.13 


3.17 (018; (ii) 17; (üi)17.07 
3.18 Х-111.60; GM.=55.35; Н.М. = 28.82. Here G.M. is the best average. 
3.19 X= 1037.73; GM. - 3772; НМ. = 186.7. 

320 Rs.2.75 per hour; (b) 69.25% marks. 

321 (i) X=Rs.10.41; (ii) Х,- Rs.10.08 

322  Rs24.50 

323 (a) 14.07 years; (b) 13.82 years e 


5% x-ray GM e 2s "eg 


a" gi 
325 y. Леви и, см.- кше 
2(n +1) : Y 3 1 
SU 
N 2 ger 


327 18.1% 
328 (b) H.M. - 40 on per hour; (с) Н.М. = 6.8 miles per hour. 
3.29 (а) 22.56 k.p.h; (b) (i) H.M. = 7.66; (ii) G.M.- 29.396 

330  GM.-49.18; H.M. - 4847. 

331  GM.- 19.80; НМ. = 1627. 

332 (i) Median - Rs.1500; (ii) Х- 65.5; (iii) Median = 18. 

333 Мейап=7; Q 261; О, 271; D, 271; Р. =74. 
334 Меап= 3.78; Median = 3; Mode - 3. 

3.35 (й) Median = Rs.9.04. 

336 Median = 48.44; Q,=34.34; О,= 6145. 
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3.37 Median = 67.72 inches; О, = 654"; О, = 69.4”. 


3.38 Median = 138.63. 
3.39 Mean = 24.96 years: Median = 23.36 years 


340  Median= 41.4; Q,- 32.46; О, = 50.72. 

3.41  Mean- 79.57; Median = 76.77; Q,=57.0; О,= 99.14. 

3.43 (a) Mean = 91.27 mg; Median = 96.58; Mode = 103.33. 
(b) Q,=77.04 mg; О, = 109.06; Р, = 84.96; P,,-93.68. 


Rs.55.00 
Rs.60.00 
Rs.62,50 
Rs7275 — 


Rs.78.75 


Rs.82.25 
Rs.85.25 
Rs.90.50 
Rs.95.00 
Rs.100.00 


The mean is approximately Rs. Cs QU 

(b) Mean = 146.975; Median N26. 75; Mode = 147.20 
Mean = 11.10; Median x 7; Mode = 11.06. 

Mode = Rs.32.48; Man = 32.49. 

(c) Mode - 27 


(a) (i) Median or Mode, (ii) Mean, к (iii) Mean or Median, 
(iv) Weighted Mean, (v) Mean, (vi) Median or Mean 
(vii) Mode, (viii) Mean. 


(x) F (standard deviation). 


344: Assuming а range of Rs.55.00 to Rs.105.00, the frequency distribution would be: 


Chapter 4, Pp. 116-129 
(a) (i) F (Dispersion), = (ii) F (zero), (iii) T, 
(iv) T, (v) F (range), (vi) F (square root), 
(vii) F (relative), (viii) T, i (ix) F (range), 
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(Ы) -() с, (ii) с, (iii) c, (iv) d, 
(v) b, (vi) b, (vii) b, (viii) a, 
(ix) а, (x) а, (xi) c, (xii) a, 
(xiii) c, (xiv) b, (xv) d. 
SUBJECTIVE 


44 (b) QD.- Rs9.78 


4.5 Median~ 152 Ib; SIQR 7.5 Ib; Mean-152.75 Ib; S.D.-10.52 Ib Mean of ori 
data=152.917 Ib, S.D. of original data=10.32 Ib. 


46 (Ы) Х=46.17; M.D.- 11.28 


47 (гошр А: Q.D.-1.285; M.D.-1.45, Co-efficient of Q.D.=0.02: 
Co-efficient of M.D. = 0.024. 


Group B: Q.D.=1.435; M.D.-1.60, Co-efficient of Q.D.=0.012; 
Co-efficient of M.D. = 0.026. 


49 (Ы) 0° =6.85, с= 2.62. SS 
411 (b) Mean=32; 5-5; (c) Mean = 74.1, 5-1 35. 


4.3 (с) (i) S=2; (ii) S = 2; answers of (i) Koincide because standard deviations 
unaffected if a constant is added. 455 


415 Х-339іп; S- 1.507 in. 
4.16 30.886 and 36.914; Contains Neat 
4.7 Place A: Х = Rs.106.32; 538530,6. 
Place В: Х=Вз. 106. з= 5.327. 
418 — X-Rs.12.006: К 32.626. 
419  "XY-Rs90 ТУ 15.99; 65%; 95%; 100%. | 
4.20 = 14.23; s = 0.72 in. smallest size = 12.82 in.; largest size = 17.14 in. 


4.21 Actual class-intervals аге; 109.5 — 115.5; 115.5 — 121.5, 121.5 - 127.5; 127 
133.5 — 139.5; 139.5 – 145.5, 145,5 - 151.5; 151.5 — 157.5 (h = 6, P.M. = 136.5). 


4.22 . Source A: Х- 1060 hours; s — 21.1 hours 


Source B: X = 1060 hours; s = 22.2 hours 
These data give а false impression as the distribution 
U-shaped. 
7424 (Б) 13.87%. 
425  (b)s-83; С.У. (A) = 16.58%; С.У. (B) = 9.50%. 
Locality А has a greater relative dispersion. 


NSWERS TO EXERCISES 


4:26 
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С.У. for X= 19.25: С.У. for Y= 25.58. > 
Candidate Х showed more consistent performance. 


(а) s (corrected) = 2.79; С.У. = 4.14%. 

(b) С.М. for batsman A=117.67%; С.У. for batsman B=70.45%. 

Batsman А is better as a run getter but batsman В is the more consistent players. 

(b) С.У. = 9.33, 9.17. 

(c) Tube B has a greater absolute dispersion. Tube A has a greater relative dispersion. 
Town А: s= 11.88, С.У. = 21.15%. 

Town B: s= 12.95, С.М. = 23.59%. 

(А): С.У. = 24.99; (B): C.V. = 23.54; (С): С.У. = 22.80. 

(а) X-57.06, S=8.75. _ 

(b) X= 16, 5-72, С.У. = 45. 


(c) 78, 15. . S 


Average score of student 4-59; Average score of student mw 
(b) Trimmed: mean = 71.33, s.d. = 5.83 S 
Winsorized: mean = 71.2, s.d. = 6.61. 45% 


(а) m, 20, m, =1.5, m, =0, m, -6,% > 
"106 0 2405, ті = ұр 7 
(b) ¥=8.16; m, =57.825; ages b, =2.76. 
= 0; т, = 2.49; т nee m, = 18.33. 
=0; m, =6, 3139s, --5.125; т, =82.58; 
b, = 0.104; b, = 2.071 
m, =0, m, =13.76, m, 23.16, m, = 528.06; 
b, = 0.004; b, = 2.79. 
(b) b, = 0.0003; b, = 2.75. 
b, = 0.0002; b, = 2.97; b (corrected) = 0.002; 
b, (corrected) = 2.66. 
(b) (i) Symmetrical; (ii) Negatively skewed; (iii) Positively skewed. 


520 


447 (i) 0.32; (ii) 0.06. 
448 m, =0; т, = 2.2081; т, = 0.1949; m, =12.9646; b, = 0.0035; b, = 2.66. 
the distribution is slightly positively skewed and is Platy-kurtic. 
4.49 (i) (b)is more consistent. (ii) (b) is negatively skewed. 
(iii) Norte of the distribution is mesokurtic, 
450 (а) 3; (b) b,= 0.49; Б, = 0.65; Platy-kurtic. 
4.51 (а) (i) Second, (ii) Neither, (iii) First; 
(b) (i) Greater than/t875, (її) 1875, (iii) Less Шап 1875. 
454 (i) ‘Mean = 80, s.d. = 8.944 (ii) mean = 75, s.d. = 11.18 
455 (i) sk = 0.413, С.У, = 25.43% 
(ii) Mean = 3633.33, S.D. = 796.70 
(iii) Mean = 3446.6630, S.D. = 876.37. C 
ner 5, Pp. s 
OBJECTIVE 
` (4) (i) F (un-weighted), P (laspeyres), и (iii) Е (two), 
(iv) F (geometric mean), ) F (geometric mean), 
(vi) F (weighted price "Әх (vii) Е (33.3%), (viii) Е (183), 
(x) T. (x) T. 
(b (i) a, (ій) d, (iv) с. 
(v) b, e Vi) 4, (vii) a, (viii) b. 
(ix) c, N (x) a, (xi) d, (xii) d, 
(xiii) b, S (xiv) d, (xv) d. 
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SUBJECTIVE SS 


5.20 


5.21 
5.22 
5.23 
5.24 
5.25 
5.26 
5.27 


(i) 100, 99.9, 101.0, 104.7, 108.9, 110.6. 

(ii) 95.5, 95.5, 96:5, 100, 104.0, 105.6. 

(i) 100, 137.9, 155.3, 166.7, 181.2, 196.8, 200.0, 223.4, 234.8, 241.5. 
100, 100.6, 89.3, 92.9, 118.3, 114.7; 100, 91.6, 82.1, 8:7, 110.7, 113.4. 
(1) 94.38, 97.67; (ii) 94.65, 98.60; (iii) 94.14, 97.59. 

100, 99.2, 74.4, 53.9. 

88.5, 88.9, 94.2, 94.7. à 

100, 107.7, 109.5, 115.0, 114.4, 118.6. 

(i) 100, 101.8, 109.8, 125.8, 130.0 (ii) 100, 101.8, 115.8, 132.6, 138.2 
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528- 116.6. 

5.29 (a) 84.14, 84.13; (b) 85.22, 8522. 
5.30. (i) 101.6, (i) 106.4. 

5.31 . (i) 11630, Gi) 116.30. 

532 (1) 115.16, 110.70, (ii) 115.37, 107.28. 


B3 B. (Marshall-Edgeworth) = 86.5, Р, (Fisher's) = 86.8 
534 (i) 49.4; (ii) 202.5. 
5.35 Index for 2004 = 92.68; Index for 2005 = 99.01 


5.36 (І) 99.06; 103.84 (11) 99.06, 103.92; (iii) 99.06, 103.88; (iv) 99.06, 103.89; 
(v) 99.06, 103.88; (vi) 99.06, 104.02. 


537 (а) (i) 100, 97, 94, 82; (ii) 100, 101, 104, 161. 
(b) Index for 1961 = 118.19; Index for 1962 = 120.00. S 
126.72. y) 


Quantity Index for 2007 on 1997 = 129.8, Quantity Index dit 997 on 2007 = 76.3 


Price index for2007 with 1997 = 118.0 and price index fg 1997 with 2007 =‹83.9 
100.33 (in both í cases). 


М 
(i). 173,8; (ii) 70.0: . - 
(її) 98.15. The prices in 1929 as 79 Т» prices іп 1928 have fallen down. 
(i) 121.23; (ii) 121.22 


(i) 1164; (ii) 116.4. д? ' 1 
124.35. N < 

) (i) 130.48; (іі) 165.34 S ] 

(i) 70.24, (11) 114.18, 1 ,205.97” 


Chapter 6, Рр. 233-243 
СТІУЕ 

(a) (i) F (a fraction), (ii) F (dependent), (ін) Е (equally likely), 
(iv) T, (v) F(not mutually exclusive), 
(vi) F, (vii) T, | (viii) T, 
(ix) F (are not equal), (х) Е(Р(АПВ) = P(A) P(B). 

(b) (i) с, (ii) c, (iii) c, (iv) b, 
(v) а, (vi) с, (vii) a, (viii) b, | 
(ix) a, (x) b, , (хї) 4, (xii) a,” 
(хін) а, (xiv) b, (xv) c. 


CTIVE 
{chair, student}. (chair, реп), (student, pen}, {chair}, {student}, {pen}, 4. 


Ан а ИНИНИ 
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63 (i {(1, D, (1, 2), (2, D, (2, 2); (ii) (61, 2), (1, 3), (2, 2), (2, 3)}, (iii) ((2, 1), (2. 2), (3, 
(3, 2)}; (iv) {(1, 2), (1, 3), (2, 2), (2, 3), (3, 3)) (v) {2, 3)}. 


64 (b) (i) {5}; Gi) {1, 3, 4, 5, 6, 7, 8, 9, 10}; (іу) {2, 3, 4, 5}; (iv) (1,2, 5, 6, 7, 8, 9, 10). 


6.5 (1) {0, 2, 3, 4, 5, 6, 8}; (ii) ф; (iii) (0, 1, 6, 7, 8, 9}; (iv) {1, 3, 5, 6, 7, 9}; (у) {0, 1, 6, 7, 8, 
(vi) {2, 4}. 


66 (a) A= {(t, 1), (1, 2), (1, 3), (2, 1), (3, 1), (2, 2)} 
В = {(1, 6), (2, 6), (3, 6), (4, 6), (5, 6), (6, 6), (6, 1), (6, 2), (6, 3), (6. 4), (6, 5)! 
6.7 (а) (i) 4; (ii) 24; (b) (i) 2730; (ii) 455. 
6.8 635, 013, 559, 600. 
6.9 2520. 


612 (a) The investment counsellor's claim is wrong, as the sum of the Lag ид 
exclusive events cannot exceed unity. 


(b) The given statement is wrong, as the probability of eaclf'ef the outcomes is not 1/3. 


(c) The given statement is wrong, as the sum of th en three mutually exclusive 
cannot exceed unity. d 


(d) Same remarks as for (c) above. E 


45% 


Бс o 2: 603 se 0» 2; OFS 


Bias shied pact TT: м 4 
614 (Ы (i) 3 6) 55 (iii) rS 5: (у) s 


|= 
Ts 
t^ 
> 
m 
N 
= 
ШЕ 


E E 
6.15 ipee ee re me red EE 
5 36 36 3 5 35.20 36 36/36 36 36736 12: 
1 X 
ө- D 
6 
1. 2371 99,4 (5 
616 —L LLL 
($363: 93663618 


és. L4 wes ан 

н 15115115115 11515151515. 
| 1 15 
618 (i) =; Gi) —. 
ДҮ ЗӨ 


25 1 
6.19 0.5177; 0.4914 =. 
(b) (6) —— 2168 
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8 5 
6.20 ---; -. 
Cen 0058 


621 (а) (i) 5; Ў ш 53 p. 3° (b) (i) 0.38, (ii) 0.62, (iii) 0.12 


6.22 pos (b) (i) —; (ii) E (0) —— E (ii) 2 
j 225 - 19 a 143 


623 (а) (i) 0.598, (ii) 1— В (5) 26; (с) 5. , 


уб Us 
6.24 ? 1001 ----; б таз 
aD er S enel 49 
625 (а) (i) 7 (ii) 30° (iii) 6 (b) 143 4. SS 
6.26 PR 1с б 1; (а) 2. ov 
305 Be T 5 RX 
2 9 45% 
627 = (с)— ху 
3 20 One 
6 ор 
628  (— S 
26 K 
e 
5 N 
6.29 өз À 
NS 
630 (b) 2; (c) 0.9; (ii) 0.6. 


631  (b)() HIE (ii) =; di) — 2 9 9 8° Gi) D (iii) 2: (iv) i 


2 2 
633 (b) (i) —; (i) — 
(b) (i) 15 @ 15 


5 1 
6.34 =; (6) =; (с) 0.25. 
(a) 9 (b) 3 (c) 
635 (4) 0.0779, (ii) 0.4062; (iii) 0.5. 
$36 (с) 0.60, 0.76, 0.60, 0.40, 0.60. 


6.41 
6.42 
6.43 
6.44 
6.45 
6.47 


6.48 


- . (b) (i) 0.3; (ii) 0.5; (с) (i) 0.10, (ii) 0.20; (iii) 0.17. 


^ P(A) hitpay/stat9943. Blogs pot.com 


4 10 Ec 
п 00 —7, iS, (00 9 


5-25 S 
@= (b) 0.012 


815 7 
9 72? (ii) 31 
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(a) True; (b) False, if 4 and B are independent, then P(A/B) = P(A); (c) True. (d) False, 
independent does not mean that two events have equal probabilities. 


(i) not independent; (ii) not independent; (iii) not independent. 


63 63 9 


65 
(а) --- (b = 
176 Кы 
7 . eS 
429 | VAS 
5 S 
= AS 
1218 *Q 
Жау ы 
(i) 32° (ii) 32 o» 
2 94 > И 
— or —— (Іп this case afe also favour). 
452-3175 А 


(i) 0.12; (ii) 0.88; ү 0.38; (іу) 0.38 


19 
Fi 2 ; dj-— 
® = (c) --- 2 6 (4) 27 
901 
1680. 
0.586. 


(а) 221. 
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(a) B. (b) POM (ii) dL 
64” қыз 15 


gene "рву /[stat9943.blogspot.com 


11 

1000 % 

29 i e 
(i) 0.256 (ii) 0.415 (iii) 0.328 ` | eS 
0.0020; 0.0096; 0.9883. E 


Chapter 7, Pp. 
| N 
CTIVE Ne | 
(a) (i) F (any), Gi /discrete), (iii) Е (continuous), 
(iv) Е (ZxP(x)), қ” F(X(x- uy P(x)), (vi) F (fixed set of values), 
(vii) F (constant), NO) F (independent), (ix) F (one), 
(x) F (atleast). = 
(b) (i) d, Ri a, (iii) 4, - o) a 
(v) b, NM (уа, (vii) b, (viii) b, 
(ix) a, (х) с. 


bd 

4'4'2 

(c) 0.3038, 0.4389, 0.2135, 0.0412, 0.00264. 
115 80 90 24 1 


(b) (i) s Gi) 


120 225 100 10 К 1 15 30 10 


` 455" 455°455°455 565656756 


7.5 


7.6. 


7.7 
7.8 
7.9 


‚7.10 


АВ 


7.12 


7.13 
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2; 
b) >. 
(b) 8 2% 
1 o, 5hitps://stat9943.blogspot.com 
8 3258 : 
o $ (ii) 0, qin i ww 2 45 
16 Е 
Gi) 3x! -2x; (iii) Te (iv) b = 0.6130. 
3 
à a) 3. 
412 214 S. 
=—,-,2; h Te ais Е 
(b) g(x) 15°3°5 (у) 53715 © 
Conditional p.d. of X given that Y- (Хх клоп; =. 
2L. T Q 
/2)==,—,— S 
о S59 Fol 
5 5 1 14 279: 
Bd — =й 
UTE 6363 об R Ju. 
f(x!) = 2.2.0 Non- 0, 
(b) X-and Y nin dent. 
6 gQ)-—., х=2,4,5; h(y) - 2- "ix -1,2,3. 
Хапа E are independent. 
6) gx) ==, x=1,2,3; (y) =% y= =1,2. 
X and к аге independent. 
(со)-2 «x; h(y) e yy; ЛОГ) =O, 
2 : ly 


2 
fox) = 22640). 


1+—х 
2 
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7.44 (a)Itisap.df (b) ož Gi) — di) 45. 2 (iv) Š 


" 
715 (6) g(x) К ОТ әйелі 
IS E 
/(у!х) = m 
pto EO Tuc DM 


бу(х+ у), 769 


/x)= кесер 
Sy!) =~ A 
717 g(x)=12x"(1—x); /(у/х)-2у. The ite eNO independent. 
719 . (97. M 
720 (a) 105 (b) E(X) does not exist. S 
3 9 o> 
7.22 (a) —;— (Ы) (i) 0.55; 1.35; (ii) 2.1; % 
_ à 
9 
a 


7.23 (а)20 oig ng en 


15 9 
724 (7 (0275, --қ 
(5)7 (с) TN 


, | 

35 Hub» 

1:25 7, —=; e 2 
OG 7] | 

7.26 (а) Rs.6 and Rs.5 (b) £64, £48, £36, £27 (с) Rs.9, Rs.6, Rs.3 


727 (а) и-0.6,с-02.. 


43-142. 


b 35 .39 
( TE 0 14 2 Te Vie TE 
728 (а) d diio Жа (b) Es с = 0.66 . 


3 63 81 
р=0; с =1. Е 
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528 
| odi or Me 
EE MEIST uc ТО 
x | 7 
11 kem hftps/stat9943:blegspot.com 
733 — 4, 0.001. | | 


2 
734 (a) u=0,0" => М.р:=*. 
9 ж=0, =й 


735 u= 


7.36 


2 


1 1 
‚=0, Ha —-— M D.=— 


i 
12 80” 4 


SIS 
"9 =т= = 2.184 
а 


= 0, и, =129, и, =0, д, = 3.86, B, = 2.33. < 


737 
738 · 
7.41 (а 
ži S 2 
10/30 8/30 4/30 2/30 =f 
(b ` Rs4, Rs3, ве) and Rs.l. 
7.42 (b) Mean = 12; pa 36 , 
(c) Expected vas = 6,00,000, S.D. = 12,000 
| Chapter 8, Рр. 329-340 
ОВЛЕСТІУЕ : 
(a) (i) Ғ, (ii) Е (two), (iii) F (fixed specified), 
(iv) T, (v) F (not equal), (vi) T, 
(vii) F (equal), (viii) Е (small, large), (іх) Т, 
(x) F (one), (xi) Е (independent) (xii) F (less), 
(xiii) F (positively skewed), (xiv) T, (xv) T. 
(b) G) 4, (ii) b, (iii) c, (iv) b, 
(v) a, . (уі) а, (vii) b, . ‚ (viii)b, 
(ix) d. : . 
SUBJECTIVE 
8.2 (b).0, 0.29, 0.936, 0, 0.352. 
8.3 (a) (i) 0. 1317, --- I E (b) @)! 0.28, (н) 0.31, ҺЕН 0.23. 


2447243 


-8.4 
8.5 


2286 


8.7 


8.8 


8.10 
8.11 


8.12 
8.13 
8.17 


8.19 
8.20 


8.21 
8.23 


8.26 


8.27 


8.28 
8.29 


8.9 


8.18 
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nis 57 $7 21 
, b , PS 
(a) (i) — 64 (n) 64 qug 27 64 (i) 32 


(8) 9,312; E n blogspot com 


(a) @ 2816. (ну >; (ь (0.0055; (н) 02903 ` 
3125' дора 


T 86, ші ео 
243 243 243 243 
(а) 0.3134, (5) 0.10737, 0. 99363 
(а) 0.4374, 6.124, 35.72, 111.132, 194.48, 181.5, 70.6. 
(b) Expected frequencies are: 3, 15, 30, 30, 15, 3. 


S 


(a) 27.3196 : (b) 1 approximately. | Ke 
52.08, 41.67, 12.50, 1.67, 0.08; Mean = 
ғ ы. P" 


(c) 0.0073; 
(a) р= 0.36, п= 100: (b)p =. иу 41; (с) No, it makes q = 1.8 which is wrong. 


` (b) Median = 5, Mode = 5. N 


(а) 0.65, (b) Theoretical пале are 8, 56, 155, 192, 89. 
p = 0.25 and the ех BP fiequencies are, 35.60, 71.19, 59.33, 26.37, 6.59, 0.88, 0.05. 
р = 0.32 and the Mected frequencies are: 32, 60, 43, 13 and 2. 


X= 5.42 and s = 1.70. 
(b) EX) = 9; Var(X) = 2.25; 0.3907. 


840 84 
4627121 

4 12 4, 3 

5) =. 

(20720 20720” 20: 914 
0.004, 0.50. 


(а) 0,3179; (5) 0.82. 
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8.31 


8.32 
8.33 


8.34 
8.35 
8.36 
8.37 
8.38 
8.39 
8.40 
8.44 
845. 
8.46 
8.47 
8.48 
8.49 
8.50 
8.51. 
8.54 
8.55 
8.56 - 
8.58 
8.59 

. 8.60 
8.61 (b) 

(©). 

862 (b 


(1) 0.2682; (11) 0.0614 
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Bee 
ЖДТ 
а) 2—2^——^—0.0240 (b) b(3; 4, 0.2) = 0.0256. 


(а) 
150 
P ttps: /istat9943. blogspot.com 


(е) 0.2019, 0.3230, 0.2584, 0.2167 
(a) 0.04918, 0.1494, 0.2241, 0.2241, 0.1681. 
(b) 0.2644, 0.1037 (c) 0.135, 0.857. 
(b) (i) 0.99994 (ii) 0.9442 

(b) (i) 0.1937 (ii) 0.1839 

(а) (i) 0.9513 (ii) 0.9989; (b) 0.221. 
(5) 0.2231, 0.1913. 

0.5620 SS 
(b) 0.1839. NM. 


S^ 


Expected frequencies are: 202.16, 137. M 59 10.72, 1.84,:0.24. 


(a) The statement is wrong. (b) 51 and 11. 


(b) (i) 0.642. (ii) 0.073. Nek 

123, 110, 49, 14, 3, 1, 0. 9 

90.3, 108.4, 65.0, 26.0, 7.8 ea 0.1; 0.0341. 
(b) 0.3679. 


(a) 0.5272, 0. ік. 2231 


(i) 0.4380; (ii) 0.5620 
(b) 0.10033 

(b) no, (c) 0.1172. 
(b) 0.0515. 

(c) 1/8. 

(b) 0.0129 


(a) 0.09; (b) (i) 0.135, (ii) 0.081, (iii) 0.081. 
(i) 0.1432; (ii) 0.0682 
9.09 
A» 9.5217, (ii) 0.0059, (ii) 0,00000(1. 
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Chapter 9, Pp. 411—420 


OBJECTIVE 
9) өт, ч https://stat9949.blpiispot.cOPT o». қ 
- . (vii) Е (equal), (viii) F (zero, one), (ix) F (one), 
(x) F (equal to), (xi) F (1), _ (xii) Е (whole & fraction), 
(xiii) F (equal to), (xiv) F (will), (xv) F (not same). 
0) @ c (ii) d, (iii) d, (iv) b, 
(v) с, (vi) d, (vii) b, (viii) d, 
(ix) с, (x) b. 
SUBJECTIVE 
91 (b) Кез ey mE VY Be 
SATA Euge Pag! S 
92 (b) 0.2231. © 
| ом 
93 (a) H=2;0°= 4;—1—: 02231; 02232. © 
1-21 << 
(b) 18.1%, 49.8%. б 
94 (твт: = =0, 25 <i = 2a! , ц, 7 9a*. 


95 д= 0,4, = 2,4, = o» 
97 (ә д=0,0° =2 SIT 0.6014. - 


9.11 (a) The distribution is Еа п) variate. 


9.15 b u-23o-412- 3.464, 


9.18 (Ы) (1) 43.82%; (ii) 67.05 
9.19 (а) 0.5000; 0.3694 (b) (1) 0.1672; (ii) 0.7492. 
920 (а) 0.6147; (b).0.4822; (c) 0.9973. 


9.21 (а)0.3085; (b) 0.6915; (c) 0.0548; (d) 0.1832; (е) 0.6898; 
(f) 0.9452. 


9.22 (БЫ) (i) 0.4052; (ii) 0.3745; (iii) 0.2358 
9.23 (8а) (1) 0.0663; (ii) 0.0062; (iii) 0.9198 (b) 69.15%. 


532. 
9.24 
9.25 
9.26 
9.27 
9.28 
929 
9.30 
9.31 
9.32 
9.33 
9.34 
9.35 
9.36 
9.37 
9.38 
9.39 
9.40 

- 9.41 

А: 9.42 


9.43 


-9.44 


9.45 


‚ Frequencies: 6.8, 30.7, 100.4, 210.4, 269.6, 221,0, 114.6, 37.6, 8.9. 
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(a) (1) 0.0918; (ii) 0.5375 (5) 0.3983; 0.4649. 
(1) 6; (ii) 131; (iii) 880; (іу) 24. 


a) (i) 26,600; 22,660 i) 2 9: (ij) 6 (їп) 2284 s 

оог РЗА 3, Bigaspot‘com 
(i) The old route is better; (ii) the new route is better. 

(a) 99.36; (b) 0.263 | 

(1) 0.4404; (11) 12.43, 82.77; (111) 26.86, 39.11, 85.28. 

(а) (i) 570; (ii) 0.4052. 

(а) 0.38% (b) 0.4514, (с) 23, (d) 189.95. . 

(а) 168 cm, (b) 6.24 years. 

87 inches. 


(a) 1512.5; 6-6.67 (b) 50; 10 (c) 4471.46, С? қа” 
40.37; 12.32. eV 
`и =1.7905т, С = 0.0706m, 2.009m. S : 
. E OS 
(b) 0.0143. A 


(b) (i) 0.9962; (ii) 0.0681; (iii) 0.055 Sy е 


(а) (i) 0.1925; (ii) 0.2177 (b) (i) ер (ii) 0.6970. 
(a) 0.599 (b) 0.7469, (i) 0.624 0.7462. 
(a) (i) 0.3251, (ii) 0.1781, \ 0.2803, (іу) 0.0459. 


E У 
(a) = 67.9; 5 = 2. ipprox. Proportions are 69%, 95% and 100%. The distributios 
nearly normal. А, - 


(b) Fitted едие№ђеѕ by the area method are: 
2.55, 5.70, 15.30, 32.62, 59.25, 92.78, 116.18, 124.65, 114.82, 84.38, 54.52, 28.05, 1 
6.68. ; : 


The equation to the fitted normal distribution is 


: 28 ЧЕ е ыз 
x) = WY 502 
ое е | A 9.50 J| 


Frequencies: 4.42, 10.00, 23.58, 41.61, 54.82, 57.20, 43.93, 26.80, 12.12, 5.52, 
Ordinates: 3.14, 9.86, 23.31, 41.31, 56.03, 57.50, 44.71, 26.25, 11.85, 3.98. 


ч» 


у 
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Chapter 10, Pp. 449-456 


OBJECTIVE 


(a) Y =27.63 — 1.063Х (b) 17.00, 15.94, 11.69, 9.56, 7.43, 6.3 
e 

5 

(b) Ў = 2.00 + 0.387 X. (d) 5.87 © 


10.6 
— C (9g X(Y —Ў)?=12.995, s, у= 1.802 


‚8. (a) У-30-Х (b) Y = 364.68 + 0.095x; SS 
HOLE -3.17 +0.32X, (d) Y= assa 

(e) P = 53.68 + 3.53X. ES 

(b) P = 130.62 — 0.57X; (NB. 


(a) = 0.43 + вд бл: (с) X = 0.27 + 0.47Y. (d) 0.20. 
(b) 20:95. У 


(b) r=-0.92 (c) г = -0.92. 

(b) = 0.93. 

(а) (i) 0.7; (ii) -0:7 (Ы) r= 0.70. 

(i) 720.91 (ii) Y = 33.3 + 143X. 

r= 0.94. 

к= 461. 

r--098; Y -31.55- 194X and X = 16 - 0.5Y. 
(a) r7 0.876 (b) ғ- 0.97 


(a) (i) F@ indi ion is), 
© re rtteissi/etat9943 Io gspetieori 
(v) F (correlation analysis), (vi) .F (must not both), 
(vii) T, (viii) T, 3 
(ix) Е (is always positive), (x) F (zero). 

(b) (i) a. .. (ii) a, (iii) a, (iv) d, 
AN) €; (vi) b, (vii) a, (viii) а, 
(ix). b, : (x) c. 

SUBJECTIVE 


(b) (i) 1234; (ii) 1168.45; (iii) 65.55; 95% variability is explained by regression model. 


533 


534 
10.22  r- 0.898. ndn. 
103 — r-0.67. 
1024 (a) r- -0,415; (b) r =0.75 8 r= Wa (d) ғ--0.94. 
5 https://stat9943. blogspot.com 
1025 r=0.58; Y — 11.32 + 0.54X. 
1026 r=0.39. 
1027 — (b)r-09883; P-033X-5409; X=3.03Y- 192.16 
= 1 
©1028 г=—. 
2 
1029  (b)r,- 0.80. 
1030  (b)r--0.6. 
1031 — (b)r,- 0.8545. =F {к 
10.32 Denoting the judges by 1,2,3; ғ = -0.21; 15 = 0.383 = 0.64. This suggests that j 
and 3 have the nearest approach to common tastes XÑ ° - 
1033 1, = 0.625; r,, — 0.503; г = 0.673. Pair (Z, Kès the nearest approach to common 
1034 7-03. ` 49 
1035. W=0.21. o> 
с 11, Pp. 478-484 
ios әу | 
(a) (i) F (co-effigient of Multiple correlations), . (ii) Е (Partial co 
(iii) F (sa-ragfü, ^. (iv) Е (positive), (у) T, 
(vi) F (0v (vii) F (n— k - 1), (viii) T, 
(ix) T, | (x) F (change). 
(b) () c, (ii) b, (ili) а, o v) с; 
(v) b, (vi) d, (vii) c, (viii) c, 
(ix) a, (x) b. 
SUBJECTIVE | 
11.2 а-348, b, - 2.09, b; - 2.65. 
11.3 (а) Р= 4.49 — 0.04X, + 0.64%. 
114 . (a) Х, = 0.04 + 0.21X; + 0.285; (b) 0.21; (c) 0.9863; 0.99. 
11.5 (а) X,= 61.40 — 3.65, + 2.54%, (b) 40; (с) 0.9927. 
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(a) Ў = 20.1084 + 0.4136X, + 2.02536, (b) Ў = 57.5 kg 

(b) Riz = 0. 77 

(а) rj = 0.952, 713 https; / 5 3t9943.hlagspot. com 
В з= 0.98; Ёз) = 0.82. 

(b) Ёз з = 0.83 (c) Roy = 0.68 

(а) А, 2з = 0.80 (b) R213 = 0.66; тз. =.0.64. 

(b) 0.48. 

ri33 = 0:759; ra = 0.436; ri = 0.097. 

X, = 9.22 + 3.37X + 0.00385. 


Х,--26.77 %0.39Х, + 0.2325; 
тз 0.55; тз: = 0.15; з = 0.21. on 
ri237 0.63; тз2 = 0.49; тз = —0.035. 

(b) (1) Not possible; (ii) Not possible; (iii) Is ae 

тзз= 0.586; ғ = 0.874, ra = 0.836, тз = 0888 


(c) R35 = = (02 - T4 2) /A- -n) which js con zero. 


(b) r237 7527 734 = –1. X 
Y= 2.08 + 0.64 X - 0.1 X^; AP 


«бене 12, Рр. 503—510 


ТЕСТІУЕ 
(c) Y= 1.52 + 1.66X 


(c) Y= 7.2 + 1.28X. The calculated values are 7.20, 8.48, 9.76, 11.04, 12.32, 13.60, 14.88, 
. 16.16, 17.44. 


(b) Y= 1.2 + 0.5X; 5.0001. 
(b) Y= 11.25 + 0.70X. 


(a) Y = 6.32 + 0.84X (b) Y= 0.9 + 4.7X and the estimated value of Y = 29.1; and X = 0.09 + 
0.19Y and the fitted value of Y = 31.1. 


(c) Y= 0.83 + 160Х- ОТ: 
(b) Y=1.172 +0.056Х + 2.28X* (c) Y=1.428 + 0.244Х + 2.214X* 
Y= 1.04 – 0.20X + 0.24% 
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129 Ү--6.622 + 1.033X — 0.0056X* 

12.11 (i) Y= 123.2 + 8.29X; 9008.4. 

(i) Y= 249tt5 $4 5tet9 O43. blogspot. com 
12.12 (i) Y= 0.32 + 1.13X; 5.819; 

(ii) Y= 1.42 — 1.07  0.55X*; 1.584: 

(iii) Y= 1.03 + 1.725X — 1.40€? + 0.3255; 0.063. 
12.13 (b) Y-10- 5X 4 2X". 
1214 — (b)Y- 74.04 (1.22)* and the estimated value of Y = 244.13. 
1215 У= 32.15 (1.427)" and the estimated value of Y = 387.40. 
1216 У= 0.8576 (1.393)* or Y = 3.22 (1.393)" with origin at 4. 
1217 У=8.478 (1.195)*. SS 
1218 ү=2.39(1.19)5 1954 forX=12. 0 0 A 


S 


12.19 (а) У= 158.5 2.17) (b) Y= 2.50+ 3504.5 


1220  Y-2033 (Xy "^. : SS 

1221 Y - 6483.1 (X) ' and the eina of Y= 12.76. 
1222  a-5703,n = 1.02; S 
| Ж қ» 

12.23 у-9.998 е ҚЫ 


12.24 (а) Y=9.88 eX “сө 
1225 log A= 12.1669" 23064 
1226 y= 1.42. 


1227 (Ы) Y 0.10 + 0.025X. 

1228 (b) Y= 4.49 – 0.04Х, + 0.64X;. 

1231  (b)X- 0.034, У--0.305 

12.32 (а) X= 1.162, Y= 2.262; (b) X = 1.02, Y= 2.50; Xe? = 0.1292 


12.33 (a)X- 0.9997, Y = 2.0010; (b) X= 2.31, Y= –1.02. 
12.34 Х= 1.17, Y 2 -0.75, Z= 2.08. 
12.35  (b)X2247,Y- 3.55, Z= 1.92. 


13.1 
13.2 


13.10 
13.11 
13.12 


13.13 


13.14 


13.15 


OBJECTIVE 
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Table 1 — Logarithms 
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